Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Go @ 6
Kubernetes @ 4
Terraform @ 7
Python @ 6
CI/CD @ 4
Machine Learning @ 4
MLOps @ 4
TensorFlow @ 4
Vertex AI @ 4
AWS @ 7
Communication @ 7
API @ 4
LLM @ 4
PyTorch @ 4
LLMOps @ 4
Observability @ 4
Generative AI @ 4
AI @ 4
Agentic AI @ 4
RAG @ 4
GenAI @ 4
LangChain @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Reddit's Machine Learning Platform team owns the infrastructure that powers recommendations, content discovery, user and content quantification. This role focuses on leading the development of a large-scale Generative AI (GenAI) platform that supports internal and external LLMs, RAG applications, agentic AI workflows, and production-grade ML/LLM operations.
Responsibilities
- Contribute to the design, implementation, and maintenance of the LLM Gateway, including unified API endpoints for internal and externally hosted LLMs, rate/token limit management, and intelligent failover mechanisms.
- Design, develop, and operate ML and Generative AI systems in cloud-based production environments at scale.
- Build and manage enterprise-grade RAG applications using embeddings, vector search, and retrieval pipelines.
- Implement and operationalize agentic AI workflows with tool use using frameworks such as LangChain and LangGraph.
- Drive adoption of MLOps / LLMOps best practices: CI/CD automation, versioning, testing, and lifecycle management for models and pipelines.
- Establish best practices for observability, monitoring, evaluation, and governance of GenAI pipelines in production.
- Lead platform delivery from concept to production with strong ownership and platform thinking.
Requirements
- 5+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
- Experience operating orchestration systems such as Kubernetes at scale.
- Deep experience with cloud-based technologies for supporting an ML platform, including AWS and Google Cloud Storage, and infrastructure-as-code (Terraform).
- Proficiency in common ML programming languages and frameworks such as Python and Go.
- Experience with modern AI/ML frameworks (examples cited: LangChain, Vertex AI Agent Builder, TensorFlow, PyTorch).
- Experience building and operating model serving and inference pipelines; knowledge of monitoring and observability for AI systems is a plus.
- Strong communication skills and the ability to articulate technical AI concepts to non-technical stakeholders.
- Emphasis on scalability, reliability, performance, platform usability, and lifecycle management of GenAI products.
Benefits
- Comprehensive healthcare benefits and income replacement programs.
- 401(k) with employer match.
- Global benefit programs including workspace, professional development, and caregiving support.
- Family planning support and gender-affirming care.
- Mental health and coaching benefits.
- Flexible vacation and paid volunteer time off.
- Generous paid parental leave.
Pay Transparency & Additional Info
- Base salary range (US): $190,800 - $267,100 USD.
- Position may be eligible for equity (restricted stock units) and, depending on role, commission.
- Interviews in select roles/locations may be recorded, transcribed, and summarized by AI; candidates may opt out prior to scheduled interviews. The posting describes categories of personal information collected for interviews and references the Candidate Privacy Policy.