Member Of Technical Staff - Post-Training And RL

at xAI
USD 180,000-600,000 per year
MIDDLE
✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences

Communication @ 6 AI @ 3 Reinforcement Learning @ 3

Details

xAI mission and team

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. The team is small, highly motivated, and focused on engineering excellence. The organization favors hands-on contributors, flat structure, initiative, and strong communication.

Location

Palo Alto, CA

Role description

You will work on critical post-training and reinforcement learning challenges, including reward modeling, preference optimization (RLHF/DPO), and reinforcement learning to improve reasoning, truthfulness, and real-world capabilities. You will receive clarity on your first project before an offer.

Responsibilities

  • Work on post-training and reinforcement learning problems (reward modeling, preference optimization, RLHF/DPO).
  • Apply reinforcement learning to improve model reasoning, truthfulness, and practical capabilities.
  • Push the boundaries of what's possible with reinforcement learning and alignment methods.

Requirements (Basic Qualifications)

  • Strong interest in truth-seeking AI and alignment.
  • Obsession with building highly useful models using post-training and RL techniques.
  • Power user of AI models and eagerness to push RL and alignment methods.
  • Prior work on post-training, RLHF, or training models used by millions is a big plus but not required.
  • Pride in work, ability to thrive in meritocratic environments, and strong communication skills.

Compensation and benefits

  • Salary: $180,000 - $600,000 USD (base salary provided; total rewards package includes equity, medical/vision/dental coverage, 401(k), short & long-term disability, life insurance, and other perks).

xAI is an equal opportunity employer. For details on data processing, see the Recruitment Privacy Notice.