Senior System Software Engineer, Enterprise MODS

at Nvidia
USD 224,000-425,500 per year
SENIOR
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 7 Linux @ 7 Python @ 4 Leadership @ 4 Communication @ 7 Mentoring @ 4 Debugging @ 4 Technical Leadership @ 4 Cloud Computing @ 4

Details

At NVIDIA, we’re tapping into the unlimited potential of AI to define the next era of computing. The data center platforms like GB200 NVL72 by NVIDIA are redefining AI, HPC, and cloud computing. To accommodate leading workloads globally, our diagnostic systems need to evolve across diverse hardware technologies. This role is focused on engineering and driving innovation in diagnostics for NVIDIA's partner ecosystem β€” validating, debugging, and optimizing complex server platforms across ODM factories, Cloud Service Provider (CSP) deployments, and field operations.

Responsibilities

  • Develop diagnostic systems for NVIDIA data center platforms, including hardware and software tools to generate worst-case stress workloads for CPUs, GPUs, memory, storage, and interconnects.
  • Lead platform bring-up and integration, ensuring diagnostics are embedded early and effectively across the server lifecycle.
  • Drive hardware validation strategy in collaboration with architecture and hardware teams; craft robust validation plans for new server generations.
  • Analyze root causes of complex failures; act as a Level 2 engineering contact for critical issues and provide scalable solutions across the stack.
  • Develop diagnostics software to ensure quality and performance at scale across ODM and partner production lines.
  • Mentor and grow engineering teams; provide technical leadership and foster a culture of innovation and excellence.
  • Influence long-term strategy by developing diagnostic architecture and roadmaps for upcoming NVIDIA and partner products.

Requirements

  • Proven experience architecting diagnostics for complex server systems, especially at the SW/HW interface.
  • Deep systems knowledge: x86 and ARM architectures, Linux and Windows OS internals, firmware (UEFI/BIOS), BMC, and platform security.
  • Ability to weigh tradeoffs in system development and drive optimal solutions with customers and multidisciplinary teams.
  • Expertise in programming languages such as C, C++, and Python for tool development and automation.
  • Familiarity with high-speed interconnects such as PCIe, Infiniband, NVLink, and Ethernet.
  • Strong communication skills for engagement with technical and executive teams.
  • BS/MS or equivalent experience in Computer Science, Electrical Engineering, or a related field.
  • 12+ years of engineering experience in diagnostics, embedded systems, or cloud platforms.

Ways to Stand Out

  • Experience driving diagnostics across rack-level or cluster-level deployments.
  • Background in cloud-scale infrastructure and partner engagement.
  • Demonstrated success influencing product direction and vendor roadmaps.
  • Passion for mentoring and building high-performing teams.

Benefits & Additional Information

  • Base salary ranges provided by level: Level 5 β€” 224000 USD to 356500 USD; Level 6 β€” 272000 USD to 425500 USD.
  • You will also be eligible for equity and benefits (see NVIDIA benefits pages).
  • Applications for this job will be accepted at least until September 13, 2025.
  • Location: Santa Clara, CA (US). Full-time role. NVIDIA is an equal opportunity employer committed to diversity and inclusion.