Data Scientist, Platform (Reliability/Latency/Inference)

at Anthropic

📍 New York City, United States
📍 San Francisco, United States
📍 Seattle, United States

USD 275,000-355,000 per year

MIDDLE

✅ Remote ✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Grafana @ 3 Prometheus @ 3 Python @ 6 SQL @ 6 A/B Testing @ 3 Statistics @ 6 Datadog @ 3 Distributed Systems @ 6 Machine Learning @ 3 Data Science @ 3 Leadership @ 3 Communication @ 3 Mathematics @ 6 SRE @ 2 Data Analysis @ 5

Details

About Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. The Data Scientist, Platform (Reliability/Latency/Inference) will be an early member of the Data Science team working at the intersection of data science and infrastructure to ensure reliable, low-latency performance of AI systems. This role uses rigorous analysis to quantify how platform performance impacts user behavior and to identify high-impact opportunities to improve system reliability and responsiveness. Your work will influence how millions of users experience Claude and other AI systems and will feed into infrastructure, product, and research decisions.

Responsibilities

Design and execute comprehensive analyses to understand how latency, reliability, errors, and refusal rates affect user engagement, satisfaction, and retention across the platform
Identify and prioritize high-impact infrastructure improvements by analyzing user behavior patterns, system performance metrics, and relationships between technical performance and business outcomes
Develop methodologies to measure platform reliability and performance: define key metrics, establish baselines, and create monitoring systems for proactive optimization
Collaborate with engineering teams to design A/B tests and controlled experiments that measure the impact of platform improvements on user experience and system performance
Investigate performance anomalies, conduct root cause analysis of reliability issues, and provide data-driven insights to guide engineering priorities and architectural decisions
Build models to forecast platform capacity needs, predict potential reliability issues, and optimize resource allocation to maintain optimal performance at scale
Present complex technical analyses and recommendations to technical and non-technical stakeholders, including engineering leadership and executive teams
Work closely with Platform Engineering, Product, and Research teams to translate technical performance data into user experience insights and strategic recommendations

Requirements

Advanced degree in Statistics, Computer Science, Engineering, Mathematics, or a related quantitative field, with 5+ years of hands-on data science experience
Deep understanding of distributed systems, cloud infrastructure, and performance engineering, with experience analyzing large-scale system metrics
Expertise in experimental design, causal inference, statistical modeling, and A/B testing frameworks, particularly in high-scale technical environments
Strong skills in Python, SQL, and data analysis tools, with experience working with large datasets and real-time streaming data
Experience translating technical performance metrics into user experience insights, including understanding how system performance affects user engagement and satisfaction
Proven ability to work effectively with engineering teams and translate complex technical analyses into actionable recommendations
Track record of using data science to drive measurable improvements in system performance, user experience, or business outcomes

Strong candidates may also have

Hands-on experience with observability tools and infrastructure monitoring platforms (e.g., Prometheus, Grafana, DataDog)
Experience with machine learning infrastructure, model serving, and understanding the performance characteristics of AI/ML systems
Familiarity with SRE practices, error budgets, SLOs/SLIs, and reliability engineering principles
Experience analyzing performance of real-time or near-real-time systems, including understanding of latency distributions and tail behavior
Background in user behavior analysis, growth metrics, or product analytics, particularly in how technical performance drives user outcomes
Direct experience working with platform or infrastructure teams in high-scale technology environments

Logistics & Benefits

Annual salary range: $275,000 - $355,000 USD
Education: at least a Bachelor's degree in a related field or equivalent experience required
Location-based hybrid policy: staff expected to be in an office at least 25% of the time; some roles may require more time on-site
Visa sponsorship: Anthropic will make reasonable efforts to sponsor visas when possible
Additional benefits: competitive compensation and benefits, optional equity donation matching, generous vacation and parental leave, flexible working hours, and office space for collaboration

How we work

Anthropic values collaborative, high-impact AI research and communication skills. The team treats AI research as an empirical science and focuses on a few large-scale research efforts. Candidates are encouraged to apply even if they do not meet every listed qualification.