Senior AI Engineer, GenAI & ML Evaluation Frameworks

πŸ“ Canada
CAD 164,500-197,400 per year
SENIOR
βœ… Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Grafana @ 4 CI/CD @ 4 Communication @ 4 LLM @ 4

Details

Grafana Labs is a remote-first, open-source company powering observability with Grafana. The Grafana AI teams build AI-driven features to help users make sense of complex observability data and reduce toil. This role is a remote opportunity for applicants in Canadian time zones only.

Responsibilities

  • Design and implement robust evaluation frameworks for Generative AI and LLM-based systems, including golden test sets, regression tracking, LLM-as-judge methods, and structured output verification.
  • Develop tooling and automated evaluation pipelines to enable low-friction evaluation of model outputs, prompts, and agent behaviors.
  • Integrate evaluation tooling and pipelines into CI/CD workflows and scale automated evaluation processes.
  • Define and refine metrics for both structural and semantic correctness, ensuring alignment with realistic use cases and operational constraints.
  • Lead dataset management processes and guide teams across Grafana in best practices for GenAI evaluation.

Requirements

  • Experience designing and implementing evaluation frameworks for AI/ML systems.
  • Familiarity with prompt engineering, structured output evaluation, and context-window management for LLM systems.
  • Ability to translate team goals into clear, testable criteria and effective tooling with high autonomy.

Bonus Qualifications

  • Experience working in environments with rapid iteration and experimental development.
  • A pragmatic mindset emphasizing reproducibility, developer experience, and thoughtful trade-offs when scaling GenAI systems.
  • Passion for minimizing human toil and building AI systems that actively support engineers.

Compensation & Benefits

  • Base compensation range (Canada): CAD 164,490 - CAD 197,389. Actual compensation may vary by level, experience, and skillset.
  • All roles include Restricted Stock Units (RSUs).
  • 100% remote company culture, global collaboration, in-person onboarding, 30 days annual leave (with 3 reserved for Grafana Shutdown Days), transparent communication, and career growth pathways.

Additional Notes

  • Grafana Labs may utilize AI tools in its recruitment process to assist matching CVs to job postings. The recruitment team will manually review inbound CVs.
  • This role is remote but currently limited to applicants in Canada time zones.