Senior Performance Modeling Architect, CPU Fabric and LLC

at Nvidia

📍 Santa Clara, United States

USD 152,000-287,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Python @ 6 Performance Monitoring @ 4 AI @ 4

Details

We are looking for a highly skilled Performance Modeling Architect to lead the architectural definition and improvement of our next-generation CPU cache hierarchies and interconnects. This role connects the low-latency, high-reliability needs of Automotive with the efficiency and high-density demands of Data Center systems. You will build the source-of-truth models that govern data movement across silicon, ensuring next-level caches (L3/System Cache) and coherent fabrics meet ambitious performance goals.

Responsibilities

Develop and maintain high-fidelity, cycle-accurate performance models (C++/SystemC) for coherent interconnects and large-scale shared caches.
Model and analyze performance bottlenecks across scales, from small-cluster automotive SoCs to large, multi-mesh data center architectures.
Evaluate the performance impact of different coherency protocols (e.g., CHI, ACE, or proprietary) and snooping filters.
Run and analyze industry-standard benchmarks (SPEC, MLPerf, Automotive-specific suites) to drive architectural trade-offs.
Collaborate with build and verification teams to correlate performance models with silicon and work with software teams to optimize drivers for the underlying hardware topology.

Requirements

Master’s or Ph.D. in Computer Engineering, Electrical Engineering, or Computer Science (or equivalent experience) with a focus on architecture, and 5+ years of experience.
Strong understanding of CPU microarchitecture, memory consistency models, and cache coherency protocols.
Proven experience in C++ or SystemC for cycle-accurate or functional modeling.
Proficiency in Python or similar scripting languages for processing large datasets, generating performance visualizations, and automating simulation sweeps.
Understanding of Network-on-Chip (NoC) topologies (Mesh, Ring, Torus), credit-based flow control, and arbitration logic.

Ways to stand out

Practical experience managing functional safety (ISO 26262) requirements for automotive chips alongside PPA considerations for data center hardware.
Experience defining or using PMU (Performance Monitoring Unit) events to debug performance on real silicon or emulators.
Background in formal methods or mathematical modeling for proving correctness of complex coherency state machines.
History of building custom internal tools or frameworks to accelerate architectural exploration.
Knowledge of emerging memory technologies like CXL (Compute Express Link) or HBM (High Bandwidth Memory) and how they interact with coherent fabrics.

Benefits

Base salary range (determined by location, experience, and comparable roles):
- Level 3: 152,000 USD - 241,500 USD
- Level 4: 184,000 USD - 287,500 USD
Eligibility for equity and company benefits.
Applications accepted at least until May 10, 2026.

NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity and inclusion.