Member of Technical Staff - Voice Model

at xAI
USD 150,000-450,000 per year
MIDDLE
✅ On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Kubernetes @ 3 Python @ 5 A/B Testing @ 3 Spark @ 3 Communication @ 6 Prioritization @ 6 Experimentation @ 3 PyTorch @ 5 AI @ 3 Reinforcement Learning @ 5

Details

xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. The team is small, highly motivated, and focused on engineering excellence. Employees are expected to be hands-on, contribute directly to the company’s mission, show initiative, and have strong communication and prioritization skills.

About the role

Join the Grok Voice Model team to help build the world’s best voice AI. The team delivers smooth, natural, low-latency spoken interactions that are expressive, multilingual, and reliable across devices and real-time scenarios. The team owns the full training pipeline: massive data curation, premium audio processing, frontier speech-language pre-training, and intensive post-training to push quality, speed, and stability to the limit.

The goal is to make talking to AI feel like conversing with the most charming, kind, and knowledgeable person imaginable. The role seeks exceptionally smart, execution-oriented engineers to help achieve this.

Responsibilities

  • Design and execute large-scale speech data curation and processing pipelines, including collection of diverse real-world audio, synthetic data generation, and automated annotation workflows to enable high-quality model training and evaluation.
  • Work on pre-training and post-training of speech-language models, with targeted enhancements through supervised fine-tuning, reinforcement learning, and other techniques to ensure Grok Voice responses are accurate, factually grounded, natural and idiomatic in spoken style, conversational in tone, and fluent across multiple languages.
  • Build and iterate a comprehensive evaluation framework covering objective metrics (accuracy, quality, latency, expressiveness), human preference studies, content factuality assessments, real-time interaction quality, and experimentation infrastructure to measure and improve performance.
  • Work closely with product teams to integrate voice models into applications and real-time environments, define spoken interaction specifications, and handle the full lifecycle from prototype to global-scale deployment for stable, low-latency, delightful voice experiences.

Requirements

  • Expert-level Python skills with deep proficiency in writing clean, efficient code for AI/ML systems.
  • Hands-on experience processing large-scale datasets using tools like Spark and Ray for cleaning, augmentation, and feature extraction.
  • Proficiency in pre-training and post-training speech-language models using JAX and PyTorch, including supervised fine-tuning, reinforcement learning, and optimizations for accuracy, factuality, natural spoken style, detail, and multilingual fluency.
  • Ability to set up and run rigorous evaluation pipelines: objective metrics, human preference studies, content factuality checks, and iterative A/B testing to drive model improvements.
  • Experience building or working with large-scale distributed training and inference systems on Kubernetes.
  • Proactive, self-driven attitude and readiness to work in a fast-paced, high-caliber team environment.

Compensation and benefits

  • Base salary: $150,000 - $450,000 USD
  • Total rewards package also includes equity, comprehensive medical, vision, and dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and other discounts and perks.

Additional notes

  • Location listed: Palo Alto, CA.
  • xAI is an equal opportunity employer.