Sr. Staff Linux Systems Engineer

at Groq
USD 310,400-365,200 per year
SENIOR
✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Go @ 4 Kubernetes @ 4 Linux @ 4 Terraform @ 4 Python @ 4 Hiring @ 4 Bash @ 4 Git @ 4 Networking @ 4 GPU @ 4

Details

Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. From our Bay Area roots to our growing global presence, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.

Groq is building a custom cloud from the ground up — one data center at a time. The Compute Storage team owns the systems that turn racks of bare metal into production-ready Kubernetes clusters powering the next generation of AI workloads. We are hiring a Sr. Staff Linux Systems Engineer to help scale this effort by creating a reliable, performant, and secure foundation for Groq Cloud.

Responsibilities

  • Kernel and OS level enablement and optimization for compute nodes (GPU, LPU) and storage clusters.
  • Work with infrastructure peers to define optimal health standards for production servers, including certified OS, Kernel, BIOS/FW versions.
  • Strengthen security posture by improving system-level CVE response processes.
  • Debug and resolve systems-level performance and reliability issues across the fleet.
  • Work with vendors to debug and resolve BIOS/FW issues.
  • Support design and deployment of large GPU clusters and network fabric integrations.
  • Lead cross-functional collaboration with data center operations, networking, and platform teams to ensure infrastructure is integrated and production-ready.
  • Follow best practices and standards for infrastructure-as-code and configuration management using Git, Flux, Terraform, and related tools.
  • Set technical direction and maintain high-quality system documentation, operational runbooks, and internal tooling to improve resilience, repeatability, and observability of the infrastructure stack.

Requirements

  • Experience with Linux OS management in large virtualized environments.
  • Deep Kernel knowledge and experience working with the upstream community to resolve bugs.
  • Experience deploying large GPU clusters and working with network fabric.
  • Familiarity with infrastructure-as-code and Git-based workflows (e.g., Terraform, Flux, Kustomize).
  • Ability to write and maintain basic tooling in Go, Python, or Bash.
  • Understanding of networking fundamentals (IPAM, VLANs, DHCP, DNS).
  • Working knowledge of storage concepts (block vs object, NFS, RAID, etc.).
  • Strong sense of ownership and willingness to dive into hardware, firmware, or low-level provisioning issues.

Nice to Have

  • Exposure to Talos Linux.
  • Experience maintaining a production Kubernetes environment.
  • Hardware SKU definition and lifecycle management.

Compensation & Benefits

  • Base salary range (United States): $310,400 to $365,200, determined by location, skills, qualifications, experience, and internal benchmarks. Compensation outside the USA depends on the local market.
  • Competitive base salary plus equity and benefits.

About Groq & Hiring Notes

  • Groq is an Equal Opportunity Employer committed to inclusion. Reasonable accommodations are available for applicants with disabilities (contact: [email protected] for accommodation requests only).
  • All offers contingent upon verification of identity and employment authorization.
  • Groq encourages applicants with criminal record histories to apply in accordance with applicable local fair chance hiring laws.