Member Of Technical Staff - Post-Training And RL

at xAI

📍 Palo Alto, United States

USD 180,000-600,000 per year

MIDDLE

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Communication @ 6 AI @ 3 Reinforcement Learning @ 3

Details

xAI mission and team

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. The team is small, highly motivated, and focused on engineering excellence. The organization favors hands-on contributors, flat structure, initiative, and strong communication.

Location

Palo Alto, CA

Role description

You will work on critical post-training and reinforcement learning challenges, including reward modeling, preference optimization (RLHF/DPO), and reinforcement learning to improve reasoning, truthfulness, and real-world capabilities. You will receive clarity on your first project before an offer.

Responsibilities

Work on post-training and reinforcement learning problems (reward modeling, preference optimization, RLHF/DPO).
Apply reinforcement learning to improve model reasoning, truthfulness, and practical capabilities.
Push the boundaries of what's possible with reinforcement learning and alignment methods.

Requirements (Basic Qualifications)

Strong interest in truth-seeking AI and alignment.
Obsession with building highly useful models using post-training and RL techniques.
Power user of AI models and eagerness to push RL and alignment methods.
Prior work on post-training, RLHF, or training models used by millions is a big plus but not required.
Pride in work, ability to thrive in meritocratic environments, and strong communication skills.

Compensation and benefits

Salary: $180,000 - $600,000 USD (base salary provided; total rewards package includes equity, medical/vision/dental coverage, 401(k), short & long-term disability, life insurance, and other perks).

xAI is an equal opportunity employer. For details on data processing, see the Recruitment Privacy Notice.