Used Tools & Technologies
Not specified
Required Skills & Competences ?
Distributed Systems @ 5 Machine Learning @ 3 TensorFlow @ 1 Hiring @ 3 PyTorch @ 1 GPU @ 3Details
Reddit is a community of communities. The Ads Training Platform pod builds and maintains the distributed training and data processing infrastructure that powers Reddit’s Ads machine learning models. The team focuses on enabling fast, reliable, and scalable model training across large datasets, supporting Ads ML teams to improve ad targeting, conversion prediction, and advertiser value.
Location: Remote - United States
Responsibilities
- Design, build, and maintain large-scale distributed training infrastructure for Ads ML models.
- Develop tools and frameworks on top of the Ray platform.
- Build tools to debug, profile, and tune distributed training jobs for performance and reliability.
- Integrate with object storage systems and improve data access patterns.
- Collaborate with ML engineers to improve model training time, efficiency, and GPU training costs.
- Drive improvements in scheduling, state management, and fault tolerance within the training platform to enhance overall performance.
Requirements
- 3+ years in infrastructure/platform engineering or large-scale distributed systems.
- 2+ years hands-on experience with the Ray platform.
- Strong understanding of distributed computing principles (task scheduling, fault tolerance, state management).
- Experience with distributed storage systems and large-scale data processing.
- Proven ability to debug and profile distributed jobs.
- Experience with deep learning frameworks (PyTorch, TensorFlow) is a big plus.
- Bonus: model optimization for distributed training, Ads ML experience.
Benefits
- Comprehensive Healthcare Benefits and Income Replacement Programs
- 401(k) Match
- Family Planning Support
- Gender-Affirming Care
- Mental Health & Coaching Benefits
- Flexible Vacation & Reddit Global Days off
- Generous paid Parental Leave
- Paid Volunteer time off
Pay Transparency
- Base pay range for US-based candidates: $185,800 - $260,100 USD.
- Position may be eligible for equity (restricted stock units) and, depending on role, commission.
- Final offer amounts depend on skills, experience, and other factors.
Additional Notes
- Reddit offers a flexible workforce: candidates near physical offices can come in as desired; the role is listed as Remote - United States.
- In select roles and locations, interviews may be recorded, transcribed, and summarized by AI; candidates can opt out prior to any scheduled interviews.
- During interviews Reddit may collect Identifiers, Professional and Employment-Related Information, Sensory Information (audio/video), and any other information candidates choose to share. Recordings are deleted promptly after hiring decisions.