Principal Inference Stack Engineer

at Groq
📍 Canada
USD 248,700-407,100 per year
SENIOR
✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Distributed Systems @ 4 TensorFlow @ 4 API @ 4 PyTorch @ 4

Details

Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. Headquartered in Silicon Valley, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.

Mission:

Lead technical efforts focused on mapping ML workloads onto Groq’s LPU through deep and first hand working knowledge of Groq’s state-of-the-art spatial compiler and ML inference stack.

Responsibilities & outcomes:

  • Analyze latest ML workloads from Groq partners or Cloud and develop optimization roadmap and strategies to improve inference performance and operating efficiency of workload.
  • Design, develop, and maintain optimizing compiler for Groq's LPU.
  • Expand Groq runtime API to simplify execution model of Groq LPUs.
  • Benchmark and analyze output produced by optimizing compiler and runtime, and drive enhancements to improve its quality-of-results when measured on the Groq LPU hardware.
  • Manage large multi-person and multi-geo projects and interface with various leads across the company.
  • Mentor junior compiler engineers and collaborate with other senior compiler engineers on the team.
  • Review and accept code updates to compiler passes and IR definitions.
  • Work with HW teams and architects to drive improvements in architecture and SW compiler.
  • Publish novel compilation techniques to Groq's TSP at top-tier ML, Applications, Compiler, and Computer Architecture conferences.

Requirements:

  • 10+ years of experience in the area of computer science/engineering or related.
  • 5+ years of direct experience with C/C++ and runtime frameworks.
  • Knowledge of LLVM and compiler architecture.
  • Experience with mapping HPC, ML, or Deep Learning workloads to accelerators.
  • Knowledge of spatial architectures such as FPGA or CGRAs an asset.
  • Knowledge with distributed systems and disaggregated compute desired.
  • Knowledge of functional programming an asset.
  • Experience with ML frameworks such as TensorFlow or PyTorch desired.
  • Knowledge of ML IR representations such as ONNX and Deep Learning.

Attributes of a Groqster:

  • Humility - Egos are checked at the door.
  • Collaborative & Team Savvy - We make up the smartest person in the room, together.
  • Growth & Giver Mindset - Learn it all versus know it all, we share knowledge generously.
  • Curious & Innovative - Take a creative approach to projects, problems, and design.
  • Passion, Grit, & Boldness - No limit thinking, fueling informed risk taking.

If this sounds like you, we’d love to hear from you!

Compensation:

At Groq, a competitive base salary is part of our comprehensive compensation package, which includes equity and benefits. For this role, the base salary range is $248,710 to $407,100, determined by your skills, qualifications, experience and internal benchmarks.