Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 6 Algorithms @ 4 Leadership @ 4 Mathematics @ 4 Networking @ 4 Performance Optimization @ 4 LLM @ 4Details
Groq delivers fast, efficient AI inference. Our LPU-based system powers GroqCloud™, giving businesses and developers the speed and scale they need. From our Bay Area roots to our growing global presence, we are on a mission to make high performance AI compute more accessible and affordable.
This role is focused on performance modeling of Groq systems on state-of-the-art AI/ML workloads to identify bottlenecks early and guide future hardware development for Groq's AI accelerator.
Responsibilities
- Develop and maintain performance models for multiple generations of Groq hardware on the latest AI/ML workloads (LLMs, CNNs, LSTMs, etc.)
- Analyze AI/ML algorithms to understand their compute, networking and memory requirements, and map them effectively onto the underlying hardware architecture
- Lead a matrixed team to enable software/hardware co-optimization across chip, system and software teams
- Identify performance bottlenecks and help drive next generation chip architecture through a solid understanding of Groq's software and hardware
- Work with silicon and system integration engineers to evaluate the costs & benefits of new technologies on Groq systems
- Provide what-if scenarios and continuous guidance directly to the CEO & senior leadership
- Develop the Design Space Exploration (DSE) tool for performance analysis and exploration of both chip and system across various workloads
- Define custom hardware solutions for high profile customers
Requirements
- Degree or equivalent experience in computer science, mathematics, electrical and computer engineering (ECE) or a related field
- Strong fundamentals in computer architecture, with deep knowledge and experience of working on domain-specific AI architectures (highly preferred)
- In-depth understanding of latest AI/ML algorithms and their hardware implications
- Ability to analyze and simplify complex hardware designs into simple abstracted timing models
- Past experience modeling AI/ML workloads and creating tools for performance optimization; experience with modeling LLM performance is beneficial but not required
- Proficient in programming languages such as C/C++ and Python
- Experience with cycle-accurate simulators for benchmarking analysis
- Experience with ASIC microarchitecture design is a plus
- Experience with understanding and simulating RTL (SystemVerilog) designs is a plus
Attributes / Culture
Groq values humility, collaboration, a growth and giver mindset, curiosity and innovation, and passion and grit. Team members are expected to work collaboratively, share knowledge generously, and take creative approaches to projects and design.
Compensation
- Base salary range (United States): $205,000 to $248,000
- Base salary is part of a comprehensive compensation package that includes equity and benefits. Compensation for candidates outside the USA will depend on the local market.
Equal Opportunity & Accommodations
Groq is an Equal Opportunity Employer and is committed to creating an inclusive environment. Reasonable accommodations for applicants with disabilities are available upon request (contact: [email protected]).