Used Tools & Technologies
Not specified
Required Skills & Competences ?
TypeScript @ 4 Python @ 4 Machine Learning @ 6 Debugging @ 4 Experimentation @ 4 Sentry @ 4Details
About Sentry
Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster so we can get back to enjoying technology. With more than $217 million in funding and 100,000+ organizations using Sentry, the company builds performance and error monitoring tools used by companies like Disney, Microsoft, and Atlassian.
Sentry embraces a hybrid work model, with Mondays, Tuesdays, and Thursdays set as in-office anchor days to encourage meaningful collaboration.
About the role
As a Senior Software Engineer on Sentry’s AI/ML team, you will build evaluation infrastructure that measures the accuracy, reliability, and real-world performance of AI systems. This role ensures debugging agents and AI-powered features behave correctly, safely, and predictably at scale. You will design datasets, benchmarks, and test harnesses that turn ambiguous AI behavior into measurable signals to help the team ship AI with confidence.
Responsibilities
- Design and build robust evaluation frameworks to measure accuracy, reliability, regressions, and edge cases in AI systems
- Create and curate high-quality datasets, golden test cases, and benchmarks grounded in real production data
- Build automated test harnesses and metrics pipelines to continuously evaluate models, prompts, and agentic workflows
- Partner with applied AI engineers and product leaders to define what “good” looks like and translate it into measurable criteria
- Own the evaluation lifecycle for major AI initiatives, from early experimentation through production monitoring
You’ll love this job if you
- Care deeply about correctness, rigor, and measurement in AI systems
- Enjoy turning fuzzy product goals and model behavior into concrete tests and metrics
- Like building foundational infrastructure that unlocks faster iteration and higher confidence for the entire AI team
- Thrive in cross-functional environments and enjoy influencing model design through better evaluation
Qualifications
- Minimum 5+ years of professional experience with a Bachelor’s degree in computer science, machine learning, or a related field
- Experience building testing, evaluation, or data infrastructure for complex systems (AI/ML experience strongly preferred)
- Comfortable writing production-quality code (we use Python and TypeScript)
- Experience working with structured and unstructured datasets, labeling workflows, or data quality pipelines
- Familiarity with modern ML systems and evaluation techniques (e.g., offline metrics, online evaluation, regression testing for models or prompts)
- Bonus: experience evaluating LLMs, agentic systems, or AI-assisted developer tools
Compensation & Benefits
The base salary range that Sentry reasonably expects to pay for this position is $240,000 to $280,000. A successful candidate’s actual base salary will be determined by factors including work location, education, relevant experience, skills, and job-related knowledge. Eligible candidates may participate in Sentry’s employee benefit plans/programs (including incentive compensation, equity grants, paid time off, and group health insurance coverage).
Workplace
- Workplace type: Hybrid — in-office anchor days on Mondays, Tuesdays, and Thursdays
Equal Opportunity & Accommodations
Sentry is committed to providing equal employment opportunities and reasonable accommodations for employees and candidates with physical or mental disabilities. If you need assistance or an accommodation due to a disability, contact [email protected]. For details about applicant data handling, see Sentry's Applicant Privacy Policy.