Senior Staff Software Engineer, High Performance GPU Inference Systems

at Groq
USD 248,700-292,600 per year
SENIOR
✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Kubernetes @ 4 Python @ 6 Algorithms @ 7 Distributed Systems @ 4 Rust @ 6 PyTorch @ 4 CUDA @ 4 GPU @ 4

Details

Groq delivers fast, efficient AI inference with an LPU-based system powering GroqCloud™. We are on a mission to make high performance AI compute more accessible and affordable. This role focuses on pushing the limits of heterogeneous GPU environments, dynamic global scheduling, and end-to-end system performance while writing code as close to the metal as possible.

Responsibilities

  • Design and implement scalable, low-latency runtime systems that coordinate thousands of GPUs across tightly integrated, software-defined infrastructure (distributed systems engineering).
  • Build deterministic, hardware-aware abstractions optimized for CUDA, ROCm, or vendor-specific toolchains to ensure ultra-efficient execution, fault isolation, and reliability (low-level GPU optimization).
  • Develop profiling, observability, and diagnostics tooling to provide real-time insights into GPU utilization, memory bottlenecks, and latency deviations; continuously improve system SLOs (performance & diagnostics).
  • Future-proof the stack to support evolving GPU architectures (e.g., H100, MI300), NVLink/Fabric topologies, and multi-accelerator systems (including FPGAs or custom silicon).
  • Collaborate cross-functionally with ML compilers, orchestration, cloud infrastructure, and hardware ops to ensure architectural alignment and unlock joint performance wins.
  • Drive automation, testability, and continuous integration practices for large-scale systems.

Requirements

  • Proven ability to ship high-performance, production-grade distributed systems and maintain large-scale GPU production deployments.
  • Deep knowledge of GPU architecture (memory hierarchies, streams, kernels), OS internals, parallel algorithms, and HW/SW co-design principles.
  • Proficiency in systems languages such as C++ (CUDA), Python, or Rust, with fluency writing hardware-aware code.
  • Strong experience and obsession with performance profiling, GPU kernel tuning, memory coalescing, and resource-aware scheduling.
  • Comfortable working across stack layers—from GPU drivers and kernels to orchestration layers and inference serving.
  • Passion for automation, testability, CI, and tooling to support reliability and performance diagnostics.

Additionally Nice to Have

  • Experience operating large-scale GPU inference systems in production (e.g., Triton, TensorRT, or custom GPU services).
  • Experience deploying and optimizing ML/HPC workloads on GPU clusters (Kubernetes, Slurm, Ray, etc.).
  • Hands-on experience with multi-GPU training/inference frameworks (PyTorch DDP, DeepSpeed, JAX).
  • Familiarity with compiler tooling and graph optimization (TVM, MLIR, XLA).
  • Experience delivering technically ambitious projects in fast-paced environments.

Attributes of a Groqster

  • Humility, collaboration, growth mindset, curiosity, innovation, passion, grit, and ownership.

Benefits & Compensation

  • Competitive base salary range for United States roles: $248,710 to $292,600 (base). Compensation includes equity and benefits; exact pay is determined by location, skills, qualifications, and experience.
  • Groq is an Equal Opportunity Employer and is committed to reasonable accommodations for qualified individuals with disabilities. For accommodation requests contact [email protected].
  • All offers contingent upon verification of identity and employment authorization in accordance with federal law.