Senior Software Engineer, AI Eval

at Sentry
USD 240,000-280,000 per year
SENIOR
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

TypeScript @ 4 Python @ 4 Machine Learning @ 6 Debugging @ 4 Experimentation @ 4 Sentry @ 4

Details

About Sentry

Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster so we can get back to enjoying technology. With more than $217 million in funding and 100,000+ organizations using Sentry, the company builds performance and error monitoring tools used by companies like Disney, Microsoft, and Atlassian.

Sentry embraces a hybrid work model, with Mondays, Tuesdays, and Thursdays set as in-office anchor days to encourage meaningful collaboration.

About the role

As a Senior Software Engineer on Sentry’s AI/ML team, you will build evaluation infrastructure that measures the accuracy, reliability, and real-world performance of AI systems. This role ensures debugging agents and AI-powered features behave correctly, safely, and predictably at scale. You will design datasets, benchmarks, and test harnesses that turn ambiguous AI behavior into measurable signals to help the team ship AI with confidence.

Responsibilities

  • Design and build robust evaluation frameworks to measure accuracy, reliability, regressions, and edge cases in AI systems
  • Create and curate high-quality datasets, golden test cases, and benchmarks grounded in real production data
  • Build automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and agentic workflows
  • Partner with applied AI engineers and product leaders to define what “good” looks like and translate it into measurable criteria
  • Own the evaluation lifecycle for major AI initiatives, from early experimentation through production monitoring

You’ll love this job if you

  • Care deeply about correctness, rigor, and measurement in AI systems
  • Enjoy turning fuzzy product goals and model behavior into concrete tests and metrics
  • Like building foundational infrastructure that unlocks faster iteration and higher confidence for the entire AI team
  • Thrive in cross-functional environments and enjoy influencing model design through better evaluation

Qualifications

  • Minimum 5+ years of professional experience with a Bachelor’s degree in computer science, machine learning, or a related field
  • Experience building testing, evaluation, or data infrastructure for complex systems (AI/ML experience strongly preferred)
  • Comfortable writing production-quality code (we use Python and TypeScript)
  • Experience working with structured and unstructured datasets, labeling workflows, or data quality pipelines
  • Familiarity with modern ML systems and evaluation techniques (e.g., offline metrics, online evaluation, regression testing for models or prompts)
  • Bonus: experience evaluating LLMs, agentic systems, or AI-assisted developer tools

Compensation & Benefits

The base salary range that Sentry reasonably expects to pay for this position is $240,000 to $280,000. A successful candidate’s actual base salary will be determined by factors including work location, education, relevant experience, skills, and job-related knowledge. Eligible candidates may participate in Sentry’s employee benefit plans/programs (including incentive compensation, equity grants, paid time off, and group health insurance coverage).

Workplace

  • Workplace type: Hybrid — in-office anchor days on Mondays, Tuesdays, and Thursdays

Equal Opportunity & Accommodations

Sentry is committed to providing equal employment opportunities and reasonable accommodations for employees and candidates with physical or mental disabilities. If you need assistance or an accommodation due to a disability, contact [email protected]. For details about applicant data handling, see Sentry's Applicant Privacy Policy.