Used Tools & Technologies
GenAIRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Go @ 6
Kubernetes @ 4
Terraform @ 7
Python @ 6
Machine Learning @ 4
Hiring @ 4
AWS @ 7
LLM @ 4
PyTorch @ 7
GPU @ 4
Observability @ 4
Generative AI @ 4
AI @ 4
vLLM @ 7
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Reddit is a community of communities built on shared interests, passion, and trust, and is home to open and authentic conversations. With 100,000+ active communities and approximately 126 million daily active unique visitors, Reddit operates one of the internet’s largest sources of information.
Team Overview
The Machine Learning Platform team at Reddit owns the infrastructure that powers recommendations, content discovery, user and content quantification, and directly impacts teams such as Growth, Ads, Feeds, and Core Machine Learning teams.
Responsibilities
- Lead the end-to-end design, implementation, and maintenance of a highly available, low-latency GPU-based model serving system for search, ranking, and LLMs supporting millions of QPS.
- Design and develop ML and Generative AI systems in cloud-based production environments on Kubernetes at scale.
- Rapidly develop prototypes and build a high-performance feature hydration and processing system as part of the inference stack, including routing, caching, and batching.
- Lead a unified GPU model export framework to support converting trained models into optimized GPU inference models.
- Implement real-time ML observability to track feature and model performance.
- Work with LLM serving online at scale and build end-to-end inference performance benchmarking frameworks.
- Architect and operate multi-cluster compute environments and manage network topology specific to ML inference use cases.
Requirements
- 7+ years of experience in ML Engineering, AI Platform Engineering, or Cloud AI Deployment roles.
- Experience operating orchestration systems such as Kubernetes at scale.
- Deep experience with cloud-based technologies for supporting an ML platform, including AWS, Google Cloud Storage, infrastructure-as-code (Terraform), and related tooling.
- Proficiency with ML programming languages and frameworks such as Python and Go.
- Strong proficiency with modern AI/ML frameworks and tooling (Triton, vLLM, PyTorch) and hands-on experience with GPU-based serving.
- Experience with LLM serving online at scale and familiarity with inference pipelines, monitoring, and observability for AI systems.
- Strong focus on scalability, reliability, performance, and ease of use, with the ability to advocate for platform users and communicate technical AI concepts to non-technical stakeholders.
Benefits
- Comprehensive healthcare benefits and income replacement programs.
- 401(k) with employer match.
- Global benefit programs including workspace support, professional development, and caregiving support.
- Family planning support and gender-affirming care.
- Mental health & coaching benefits.
- Flexible vacation & paid volunteer time off.
- Generous paid parental leave.
Pay Transparency
- Base salary range for this U.S.-based position: $253,300 - $354,600 USD.
- In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on position, it may also be eligible to receive a commission.
Candidate Privacy & Interview Notes
- In select roles and locations, interviews may be recorded, transcribed, and summarized by AI; candidates may opt out prior to scheduled interviews.
- During interviews, Reddit may collect identifiers, professional and employment-related information, sensory information (audio/video), and any other information candidates choose to share; recordings will be deleted promptly after hiring decisions as described in the Candidate Privacy Policy.
Equal Opportunity
Reddit is an equal opportunity employer committed to building a diverse workforce and providing reasonable accommodations for qualified individuals with disabilities and disabled veterans during the application process.