Principal Performance Modeling Engineer

at Groq

📍 United States

USD 205,000-248,000 per year

SENIOR

✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 6 Algorithms @ 4 Leadership @ 4 Mathematics @ 4 Networking @ 4 Performance Optimization @ 4 LLM @ 4

Details

Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. From our Bay Area roots to our growing global presence, we are on a mission to make high performance AI compute more accessible and affordable.

This role is focused on performance modeling of Groq systems on state-of-the-art AI/ML workloads to identify bottlenecks early and guide future hardware development for Groq's AI accelerator.

Responsibilities

Develop and maintain performance models for multiple generations of Groq hardware on the latest AI/ML workloads (LLMs, CNNs, LSTMs, etc.)
Analyze AI/ML algorithms to understand their compute, networking and memory requirements, and map them effectively onto the underlying hardware architecture
Lead a matrixed team to enable software/hardware co-optimization across chip, system and software teams
Identify performance bottlenecks and help drive next generation chip architecture through a solid understanding of Groq's software and hardware
Work with silicon and system integration engineers to evaluate the costs & benefits of new technologies on Groq systems
Provide what-if scenarios and continuous guidance directly to the CEO & senior leadership
Develop the Design Space Exploration (DSE) tool for performance analysis and exploration of both chip and system across various workloads
Define custom hardware solutions for high profile customers

Requirements

Degree or equivalent experience in computer science, mathematics, electrical and computer engineering (ECE) or a related field
Strong fundamentals in computer architecture, with deep knowledge and experience of working on domain-specific AI architectures (highly preferred)
In-depth understanding of latest AI/ML algorithms and their hardware implications
Ability to analyze and simplify complex hardware designs into simple abstracted timing models
Past experience modeling AI/ML workloads and creating tools for performance optimization; experience with modeling LLM performance is beneficial but not required
Proficient in programming languages such as C/C++ and Python
Experience with cycle-accurate simulators for benchmarking analysis
Experience with ASIC microarchitecture design is a plus
Experience with understanding and simulating RTL (SystemVerilog) designs is a plus

Attributes / Culture

Groq values humility, collaboration, a growth and giver mindset, curiosity and innovation, and passion and grit. Team members are expected to work collaboratively, share knowledge generously, and take creative approaches to projects and design.

Compensation

Base salary range (United States): $205,000 to $248,000
Base salary is part of a comprehensive compensation package that includes equity and benefits. Compensation for candidates outside the USA will depend on the local market.

Equal Opportunity & Accommodations

Groq is an Equal Opportunity Employer and is committed to creating an inclusive environment. Reasonable accommodations for applicants with disabilities are available upon request (contact: [email protected]).