Principal Inference Stack Engineer

at Groq

📍 Canada

USD 248,700-407,100 per year

SENIOR

✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Distributed Systems @ 4 TensorFlow @ 4 API @ 4 PyTorch @ 4

Details

Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. Headquartered in Silicon Valley, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.

Mission:

Lead technical efforts focused on mapping ML workloads onto Groq’s LPU through deep and first hand working knowledge of Groq’s state-of-the-art spatial compiler and ML inference stack.

Responsibilities & outcomes:

Analyze latest ML workloads from Groq partners or Cloud and develop optimization roadmap and strategies to improve inference performance and operating efficiency of workload.
Design, develop, and maintain optimizing compiler for Groq's LPU.
Expand Groq runtime API to simplify execution model of Groq LPUs.
Benchmark and analyze output produced by optimizing compiler and runtime, and drive enhancements to improve its quality-of-results when measured on the Groq LPU hardware.
Manage large multi-person and multi-geo projects and interface with various leads across the company.
Mentor junior compiler engineers and collaborate with other senior compiler engineers on the team.
Review and accept code updates to compiler passes and IR definitions.
Work with HW teams and architects to drive improvements in architecture and SW compiler.
Publish novel compilation techniques to Groq's TSP at top-tier ML, Applications, Compiler, and Computer Architecture conferences.

Requirements:

10+ years of experience in the area of computer science/engineering or related.
5+ years of direct experience with C/C++ and runtime frameworks.
Knowledge of LLVM and compiler architecture.
Experience with mapping HPC, ML, or Deep Learning workloads to accelerators.
Knowledge of spatial architectures such as FPGA or CGRAs an asset.
Knowledge with distributed systems and disaggregated compute desired.
Knowledge of functional programming an asset.
Experience with ML frameworks such as TensorFlow or PyTorch desired.
Knowledge of ML IR representations such as ONNX and Deep Learning.

Attributes of a Groqster:

Humility - Egos are checked at the door.
Collaborative & Team Savvy - We make up the smartest person in the room, together.
Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously.
Curious & Innovative - Take a creative approach to projects, problems, and design.
Passion, Grit, & Boldness - No limit thinking, fueling informed risk taking.

If this sounds like you, we’d love to hear from you!

Compensation:

At Groq, a competitive base salary is part of our comprehensive compensation package, which includes equity and benefits. For this role, the base salary range is $248,710 to $407,100, determined by your skills, qualifications, experience and internal benchmarks.