Research Engineer/Research Scientist, Audio

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 350,000-500,000 per year

MIDDLE

✅ Remote ✅ Hybrid

✅ Visa Sponsorship

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Kubernetes @ 3 Algorithms @ 3 Communication @ 6 PyTorch @ 3

Details

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Audio team builds audio capabilities with large language models, focusing on safe, steerable, and reliable speech and audio systems. Work spans audio codecs and representations, sourcing and synthesizing high-quality audio data, training large-scale speech language models and large audio diffusion models, and designing architectures to incorporate continuous signals into LLMs. The team focuses primarily on speech — conversational systems, speech and audio understanding, and speech synthesis — and partners across pretraining, finetuning, reinforcement learning, production inference, and product to move research into real-world deployments.

Responsibilities

Conduct research and engineering across the full audio ML stack: audio codecs and representations, dataset scaling and curation, model architectures, training, and inference optimization.
Train and evaluate large-scale speech and audio models (e.g., speech-to-speech, speech translation, ASR, TTS, generative audio models, diffusion models).
Develop and optimize training pipelines and distributed training performance; debug performance issues across the stack.
Collaborate closely with cross-functional teams (pretraining, finetuning, RL, production inference, product) to take research to deployment.
Design robust evaluation methodologies for hard-to-measure qualities (naturalness, expressiveness) and study training dynamics for mixed audio-text models.
Work on real-world-focused projects such as neural audio codecs, diffusion pretraining, reinforcement learning for audio, latency and throughput optimization for streaming audio systems.

Requirements

Hands-on experience training audio models (examples given: conversational speech-to-speech, speech translation, speech recognition, text-to-speech, diarization, codecs, generative audio models).
Comfortable working across abstraction levels from signal processing fundamentals to large-scale model training and inference optimization.
Deep expertise with JAX, PyTorch, or large-scale distributed training and the ability to debug performance issues across the full stack.
Strong collaborative and communication skills; ability to work effectively with many teams across the company.
At least a Bachelor's degree in a related field or equivalent experience (required).
Comfortable working in a fast-moving environment where priorities may shift as experiments reveal what works.

Strong candidates may also have experience with

Large language model pretraining and finetuning.
Training diffusion models for image and audio generation.
Reinforcement learning for large language models and diffusion models.
End-to-end system optimization, including performance benchmarking and kernel optimization.
GPUs, Kubernetes, PyTorch, or distributed training infrastructure.

Representative projects

Training state-of-the-art neural audio codecs for 48 kHz stereo audio.
Developing novel algorithms for diffusion pretraining and reinforcement learning.
Scaling audio datasets to millions of hours of high-quality audio.
Creating robust evaluation methodologies for qualities such as naturalness or expressiveness.
Studying training dynamics of mixed audio-text language models.
Optimizing latency and inference throughput for deployed streaming audio systems.

Logistics & Compensation

Annual salary range: $350,000 - $500,000 USD.
Location: Remote-friendly with travel required; offices in San Francisco, CA; Seattle, WA; and New York City, NY. We currently expect staff to be in one of our offices at least 25% of the time (location-based hybrid policy).
Education: Minimum Bachelor’s degree in a related field or equivalent experience.
Visa sponsorship: Anthropic states they do sponsor visas and retain an immigration lawyer to assist, though sponsorship may not be possible for every role/candidate.

Benefits & Culture

Competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office collaboration space.
Emphasis on large-scale, high-impact research with collaborative research discussions and strong communication expectations.
Encouragement to apply even if not all qualifications are met; commitment to diversity and consideration of societal impacts of voice AI.