Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 6 Python @ 4 Leadership @ 4 Communication @ 7 Stress Testing @ 4 Debugging @ 4 Engineering Management @ 6 GPU @ 4Details
NVIDIA’s Data Center MODS organization is looking for an Engineering Manager to help Cloud Service Providers (CSPs) and OEMs scale out current and next generation datacenter products. You will be responsible for validating and scaling NVIDIA’s GPU products at the system level, pushing hardware to its limits to ensure adaptability and reliability across diverse environments — from internal validation labs to hyperscale data centers. Our organization partners closely with architecture, ASIC, operations, and data center teams to build methodologies that stress every subsystem of the GPU and server platform. The team also supports diagnostics for customer deployments, tailoring stress workloads to specific configurations and use cases.
Responsibilities
- Lead and mentor a high-performing engineering team, fostering technical growth and leadership.
- Collaborate with architecture and hardware teams to drive development of stress and diagnostic software targeting GPUs, CPUs, memory, storage, and interconnects.
- Lead multiple concurrent projects, balancing long-term strategy with short-term execution.
- Work with Cloud Service Providers (CSPs), OEMs, and data center operators to support deployment and customization of diagnostics.
- Champion continuous improvement in product quality, debug efficiency, and operational scalability.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field or equivalent experience.
- 10+ overall years of experience in system software development, with 4+ years in engineering management.
- Experience with C/C++ and Python.
- Deep understanding of operating systems, kernel drivers, and hardware–software interaction.
- Experience with PC/server architecture, including PCIe, NVLink, Infiniband, or Ethernet.
- Consistent track record of leading feature development and multi-team debugging efforts.
Ways to Stand Out
- Experience with diagnostics or stress testing in large-scale data center environments.
- Familiarity with GPU compute, graphics, memory subsystems, or high-speed interfaces.
- Prior experience working with CSPs or OEMs on system-level validation and deployment.
- Strong communication and multi-functional leadership skills.
- Passion for building tools that ensure product excellence and customer success.
Compensation & Benefits
- Base salary ranges (determined by location, experience, and internal pay bands):
- Level 4: 272,000 USD - 425,500 USD
- Level 5: 308,000 USD - 471,500 USD
 
- Eligible for equity and additional benefits. (Link to benefits provided in original posting.)
Additional Information
- Applications for this job will be accepted at least until October 10, 2025.
- NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.