Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Kubernetes @ 7
Terraform @ 7
Python @ 6
Spark @ 4
GCP @ 7
Data Structures @ 4
Machine Learning @ 4
MLOps @ 4
TensorFlow @ 6
Hiring @ 4
Apache Beam @ 4
Communication @ 7
MLFlow @ 4
PyTorch @ 6
GPU @ 4
AI @ 4
Profiling @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 121 million daily active unique visitors, Reddit is one of the internet’s largest sources of information.
Who We Are
The Machine Learning Platform team at Reddit owns the infrastructure that powers recommendations, content discovery, user and content quantification, and directly impacts teams such as Growth, Ads, Feeds, and Core Machine Learning teams.
Responsibilities
- Lead development of a platform for large-scale ML models at Reddit.
- Design end-to-end model lifecycle patterns (MLOps) to boost ML engineer velocity, including data preparation, model management, experiment tracking, and more.
- Zero-to-one development and support of a graph ML codebase and platform that abstracts common patterns and enables greater model scalability and iteration.
- Collaborate with ML engineers on performance tuning, including improving model training time, efficiency, and GPU training costs in a large distributed ML training environment.
- Optimize batch data processing within a data warehouse and with tools such as Apache Beam, Apache Spark, Ray Data, and more.
- Architect pipelines to build and maintain massive graph data structures on the order of billions of nodes and tens of billions of edges.
Requirements
- 5+ years of experience in ML infrastructure, including model training and model deployments.
- Hands-on experience with ML optimization, including memory and GPU profiling.
- Deep experience with cloud-based technologies for supporting an ML platform, including tools like GCP BigQuery, Google Cloud Storage, infrastructure-as-code (Terraform), and more.
- Hands-on experience administering and integrating MLOps tools for experiment tracking, model serving, and model registries (e.g., MLflow or Weights & Biases).
- Proficiency with common ML programming languages and frameworks such as Python, PyTorch, and TensorFlow.
- Deep experience working with distributed training frameworks, including Ray and Kubernetes.
- Strong focus on scalability, reliability, performance, and ease of use; advocacy for platform users and strong intuition for the ML development lifecycle.
- Strong organizational and communication skills.
- Experience with graph databases (Neo4j, JanusGraph, TigerGraph) is a strong plus.
- Experience with graph neural networks (GNNs) and graph ML frameworks (PyTorch Geometric, Deep Graph Library) is a strong plus.
Pay Transparency & Benefits
- Base salary range (U.S.-based): $216,700 - $303,400 USD per year.
- In addition to base salary, this job is eligible to receive equity in the form of restricted stock units, and depending on the position offered, it may also be eligible to receive a commission.
- Reddit offers benefits to U.S.-based employees including medical, dental, and vision insurance, 401(k) program with employer match, generous time off, and parental leave.
Interview & Privacy Notes
- In select roles and locations, interviews may be recorded, transcribed, and summarized by AI; candidates can opt out prior to scheduled interviews.
- During interviews, Reddit will collect identifiers, professional/employment-related information, sensory information (audio/video), and any other information candidates choose to share. Recordings will be deleted promptly after making a hiring decision.
Equal Opportunity
- Reddit is an equal opportunity employer and is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans during the interview process.