Used Tools & Technologies
Not specified
Required Skills & Competences ?
Automated Testing @ 4 CI/CD @ 4 Algorithms @ 4 PyTorch @ 3Details
Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. Headquartered in Silicon Valley, we are on a mission to make high performance AI compute more accessible and affordable. When real-time AI is within reach, anything is possible. Build fast.
Responsibilities
- Deliver end to end FPGA solutions bridging the gap between the world and our accelerators.
- Build and operate real-time, distributed compute frameworks and runtimes to deliver planet-scale inference for LLMs and advanced AI applications at ultra-low latency, optimized for heterogeneous hardware and dynamic global workloads.
- Develop deterministic, low-overhead hardware abstractions for thousands of synchronously coordinated GroqChips across a software-scheduled interconnection network. Prioritize fault tolerance, real-time diagnostics, ultra-low-latency execution, and mission-critical reliability.
- Future-proof Groq's software stack for next-gen silicon, innovative multi-chip topologies, emerging form factors, and heterogeneous co-processors (e.g., FPGAs).
- Foster collaboration across cloud, compiler, infra, data centers, and hardware teams to align engineering efforts, enable seamless integrations, and drive progress toward shared goals.
Your code will run at the edge of physics—every clock cycle saved reduces latency for millions of users and extends Groq's lead in the AI compute race.
Requirements
- Deep curiosity about system internals—from kernel-level interactions to hardware dependencies—and fearless problem solving across abstraction layers down to the HDL for chips.
- Expertise in computer architecture, compiler backends, algorithms, and hardware-software interfaces.
- Mastery of system-level programming (Haskell, C++, or similar) with emphasis on low-level optimizations and hardware-aware design.
- Consistent delivery of high-impact, production-ready code while collaborating effectively with cross-functional teams.
- Excellence in profiling and optimizing systems for latency, throughput, and efficiency with zero tolerance for wasted cycles or resources.
- Commitment to automated testing and CI/CD pipelines, believing that "untested code is broken code."
- Pragmatic technical judgments balancing short-term velocity with long-term system health.
- Writing empathetic, maintainable code with strong version control and modular design, prioritizing readability and usability.
- Nice to have: Experience with FPGA development, VFIO drivers, HDL languages, shipping complex projects in fast-paced environments, hands-on optimization of performance-critical applications using GPUs, FPGAs, or ASICs, and familiarity with ML frameworks (e.g., PyTorch) and compiler tooling (e.g., MLIR) for AI/ML integration.
Ideal Candidate Traits
- Initiates solutions without derailing team priorities.
- Builds and ships real, valuable code.
- Passionate about craftsmanship and continuous improvement.
- Collaborative team player who aligns goals with teammates and customers.
- Takes full ownership from design to deployment and maintenance.
This role is not typical corporate work but a mission to redefine AI compute with strong academic foundations.
Compensation
Competitive base salary ranging from £81,800 to £107,300 GBP plus equity and benefits.
Location
London, United Kingdom (Remote)