Staff AI Engineer - Grafana Ops, AI/ML

at Grafana Labs

📍 Canada

CAD 186,400-223,600 per year

SENIOR

✅ Remote

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Security @ 4 Docker @ 4 Grafana @ 4 Kubernetes @ 4 DevOps @ 4 Terraform @ 4 GCP @ 4 AWS @ 4 Azure @ 4 Communication @ 4 Experimentation @ 4 LLM @ 4 Codex @ 4 Observability @ 4 AI @ 4 GenAI @ 4 Prompt Engineering @ 4

Details

Grafana Labs is a remote-first, open-source company that builds observability tools used by millions globally. The Grafana AI teams develop AI-driven features to help users make sense of complex observability data, reduce toil, and surface meaningful signals from noisy environments. This role is remote and Grafana is interested in applicants from Canada time zones only.

Responsibilities

Build and deliver AI solutions: take ownership of developing high-performance AI features to help users detect, triage, and resolve incidents using observability data and tools.
Rapid experimentation and iteration: prototype, test, validate with real users, and ship LLM- or agent-powered workflows for incident lifecycle management and automated analysis tasks.
Collaborate cross-functionally: work with data analysts, product managers, and designers to shape AI-driven product features and integrate agentic components with internal tools, alerting systems, runbooks, and developer workflows.
Utilize AI tools effectively: use AI and automation tools to enhance product functionality and development workflows.
Communicate effectively in a dynamic environment and take full ownership to ensure solutions are scalable, maintainable, and aligned with real user workflows.
Use modern AI coding assistants and frontier models (examples mentioned: GPT-Codex 5/3, Claude Opus 4.6, Gemini 3 Pro) as part of development within security guidelines.

Requirements

Experience with LLMs, prompt engineering, and building applications powered by GenAI.
Proven track record of delivering software that made it into production and is actively used by users.
Exposure to cloud-native environments (e.g., AWS, GCP, Azure).
Experience using observability tools to understand and troubleshoot system behavior (Grafana-related stack referenced: Grafana, Mimir, Loki, Tempo).
Strong engineering skills: solid experience building production software systems (backend and/or full stack), self-starter, comfortable tackling complex engineering problems with minimal supervision.
Quick iteration and experimentation mindset; ability to release prototypes, collect feedback, and iterate.
Proven initiative and ability to operate in ambiguous situations while defining scope and driving projects forward.
Collaborative attitude and effective communication with peers, product managers, and designers.

Bonus Points

Experience building or working with agent frameworks or multi-agent workflows.
Experience with infrastructure/devops tooling (Kubernetes, Docker, Terraform or similar) for deployments.
Familiarity with model fine-tuning techniques.
Experience building observability tooling.

Compensation & Rewards

In Canada, the base compensation range for this role is CAD 186,368 - CAD 223,642. Actual compensation may vary based on level, experience, and skillset as assessed throughout the interview process.
All roles include Restricted Stock Units (RSUs).

Why You’ll Thrive at Grafana Labs

100% remote, global culture with a focus on autonomy and collaboration.
Scaling organization with meaningful work in a high-growth environment.
Transparent communication, open decision-making, and innovation-driven culture.
Open-source roots and empowered teams with career growth pathways.
In-person onboarding and a global annual leave policy (30 days per annum, with 3 days reserved for Grafana Shutdown Days). Grafana will comply with local legislation where applicable.

Equal Opportunity

Grafana Labs is an equal opportunity employer and will recruit, train, compensate and promote regardless of race, religion, color, national origin, gender, disability, age, veteran status, and other characteristics. Grafana Labs may utilize AI tools in its recruitment process to assist in matching information provided in CVs to job postings; recruitment team will continue to review inbound CVs manually.