Senior AI Engineer, GenAI & ML Evaluation Frameworks

at Grafana Labs

📍 United States

USD 154,400-185,300 per year

SENIOR

✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Grafana @ 4 CI/CD @ 4 LLM @ 4

Details

Grafana Labs is seeking an experienced engineer to design, build, and scale evaluation frameworks for Generative AI systems, particularly Large Language Models (LLMs). The role focuses on creating automated evaluation pipelines, integrating evaluations into CI/CD workflows, defining metrics that reflect product goals and model behavior, and guiding dataset management and best practices across teams. This is a remote opportunity; applicants should be located in USA time zones only.

Responsibilities

Design and implement robust evaluation frameworks for GenAI and LLM-based systems, including golden test sets, regression tracking, LLM-as-judge methods, and structured output verification.
Develop tooling to enable automated, low-friction evaluation of model outputs, prompts, and agent behaviors.
Define and refine metrics for both structural and semantic aspects of model outputs, ensuring alignment with realistic use cases and operational constraints.
Integrate evaluation pipelines into CI/CD workflows and scale automated evaluation processes.
Lead dataset management processes and provide guidance on best practices for GenAI evaluation across teams.

Requirements

Experience designing and implementing evaluation frameworks for AI/ML systems.
Familiarity with prompt engineering, structured output evaluation, and context-window management in LLM systems.
Ability to translate team goals into clear, testable evaluation criteria and tooling; high autonomy and strong collaboration skills.

Bonus Points

Experience working in fast-iteration, experimental development environments.
Pragmatic mindset valuing reproducibility, developer experience, and trade-offs when scaling GenAI systems.
Passion for minimizing human toil and building AI systems that actively support engineers.

Compensation & Rewards

Base compensation range (United States): USD 154,445 - USD 185,334. Actual compensation may vary by level, experience, and interview assessment.
All roles include Restricted Stock Units (RSUs) and may include bonus and other benefits.

Other Details

This is a remote-first company; this particular role is remote and Grafana Labs is interested in applicants from USA time zones only.
In-person onboarding is provided.
Grafana Labs operates a global annual leave policy (30 days per annum) with company shutdown days and will comply with local legislation where applicable.
Grafana Labs is an equal opportunity employer and may utilize AI tools in the recruitment process while continuing manual review of CVs.