Senior Performance Modeling Architect, CPU Fabric and LLC

at Nvidia
USD 152,000-287,500 per year
SENIOR
✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences

Python @ 6 Performance Monitoring @ 4 AI @ 4

Details

We are looking for a highly skilled Performance Modeling Architect to lead the architectural definition and improvement of our next-generation CPU cache hierarchies and interconnects. This role connects the low-latency, high-reliability needs of Automotive with the efficiency and high-density demands of Data Center systems. You will build the source-of-truth models that govern data movement across silicon, ensuring next-level caches (L3/System Cache) and coherent fabrics meet ambitious performance goals.

Responsibilities

  • Develop and maintain high-fidelity, cycle-accurate performance models (C++/SystemC) for coherent interconnects and large-scale shared caches.
  • Model and analyze performance bottlenecks across scales, from small-cluster automotive SoCs to large, multi-mesh data center architectures.
  • Evaluate the performance impact of different coherency protocols (e.g., CHI, ACE, or proprietary) and snooping filters.
  • Run and analyze industry-standard benchmarks (SPEC, MLPerf, Automotive-specific suites) to drive architectural trade-offs.
  • Collaborate with build and verification teams to correlate performance models with silicon and work with software teams to optimize drivers for the underlying hardware topology.

Requirements

  • Master’s or Ph.D. in Computer Engineering, Electrical Engineering, or Computer Science (or equivalent experience) with a focus on architecture, and 5+ years of experience.
  • Strong understanding of CPU microarchitecture, memory consistency models, and cache coherency protocols.
  • Proven experience in C++ or SystemC for cycle-accurate or functional modeling.
  • Proficiency in Python or similar scripting languages for processing large datasets, generating performance visualizations, and automating simulation sweeps.
  • Understanding of Network-on-Chip (NoC) topologies (Mesh, Ring, Torus), credit-based flow control, and arbitration logic.

Ways to stand out

  • Practical experience managing functional safety (ISO 26262) requirements for automotive chips alongside PPA considerations for data center hardware.
  • Experience defining or using PMU (Performance Monitoring Unit) events to debug performance on real silicon or emulators.
  • Background in formal methods or mathematical modeling for proving correctness of complex coherency state machines.
  • History of building custom internal tools or frameworks to accelerate architectural exploration.
  • Knowledge of emerging memory technologies like CXL (Compute Express Link) or HBM (High Bandwidth Memory) and how they interact with coherent fabrics.

Benefits

  • Base salary range (determined by location, experience, and comparable roles):
    • Level 3: 152,000 USD - 241,500 USD
    • Level 4: 184,000 USD - 287,500 USD
  • Eligibility for equity and company benefits.
  • Applications accepted at least until May 10, 2026.

NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity and inclusion.