Senior Research Engineer, Post-Training & Evaluation

at Reddit

📍 United States

USD 216,700-303,400 per year

SENIOR

✅ Remote ✅ Hybrid

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 6 CI/CD @ 4 Machine Learning @ 6 Hiring @ 4 Data Engineering @ 7 MLFlow @ 4 LLM @ 6 PyTorch @ 6 AI @ 4 vLLM @ 6

Details

Reddit is a community of communities. It’s built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet. Every day, Reddit users submit, vote, and comment on the topics they care most about. With 100,000+ active communities and approximately 121 million daily active unique visitors, Reddit is one of the internet’s largest sources of information.

This role is completely remote friendly within the United States. If you live close to one of Reddit's physical office locations (San Francisco, Los Angeles, New York City & Chicago) you may come into the office as often as you'd like.

The AI Engineering team at Reddit is building Reddit-native foundational Large Language Models (LLMs). This team sits at the intersection of applied research and massive-scale infrastructure, tasked with training models that understand Reddit culture, language, and structure. As a Senior Research Engineer for Post-Training & Evaluation, you will own the feedback loop of model development: architecting evaluation suites and fine-tuning pipelines that ensure models are safe, capable, and aligned to Reddit.

Responsibilities

Architect and maintain the "Reddit Benchmark" evaluation suite: a comprehensive harness that tests model capabilities across Safety, Reasoning, and Reddit-specific knowledge (slang, norms).
Build scalable Supervised Fine-Tuning (SFT) pipelines: implement efficient, distributed training loops for instruction tuning to convert base models into helpful assistants.
Develop Model-as-a-Judge systems: engineer automated evaluation pipelines using strong models (e.g., GPT-5, Nova, Claude) to grade outputs of internal models for rapid iteration.
Execute synthetic data generation strategies: create and curate high-quality instruction sets to improve model generalization where human data is scarce.
Collaborate with Safety Engineering: translate high-level safety policies into concrete evaluation metrics and unit tests that run in CI/CD pipelines.
Debug post-training instability: analyze loss curves and evaluation logs to identify when fine-tuning causes alignment regressions or capability degradation.

Requirements

4+ years of professional experience in machine learning engineering, with a focus on LLM fine-tuning or evaluation.
Fluency in Python and PyTorch, with experience using libraries like Hugging Face Transformers, vLLM, or lm-eval-harness.
Deep understanding of Instruction Tuning (SFT) and how data quality impacts model behavior.
Experience building evaluation pipelines and benchmarks (knowledge of differences between MMLU, GSM8K, etc.) and ability to build domain-specific benchmarks.
Familiarity with distributed training (FSDP/DeepSpeed) for fine-tuning jobs.
Strong data engineering skills for curating and cleaning instruction datasets.

Nice to Have

Experience with MLFlow, Weights & Biases, or other experiment tracking tools.
Experience with synthetic data generation (e.g., Self-Instruct papers).

Benefits

Comprehensive Healthcare Benefits and Income Replacement Programs
401k with Employer Match
Global benefit programs (workspace, professional development, caregiving support)
Family Planning Support
Gender-Affirming Care
Mental Health & Coaching Benefits
Flexible Vacation & Paid Volunteer Time Off
Generous Paid Parental Leave

Pay Transparency

The base salary range for this position is: $216,700 - $303,400 USD.

In addition to base salary, this role may be eligible for restricted stock units and, depending on position, a commission. Reddit offers a wide range of benefits to U.S.-based employees. Final offers are determined by factors including skills and experience.

Interview & Privacy Notes

In select roles and locations, interviews may be recorded, transcribed, and summarized by AI; candidates may opt out. During interviews Reddit may collect Identifiers, Professional and Employment-Related Information, Sensory Information (audio/video), and other information the candidate chooses to share. Recordings are deleted after hiring decisions. For more information see Reddit's Candidate Privacy Policy.