Manager, Engineering - Data Center Management

at Nvidia
USD 224,000-425,500 per year
MIDDLE
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 7 Python @ 6 Leadership @ 3 Communication @ 6 Git @ 3 Jira @ 3 Debugging @ 6 Technical Leadership @ 3 Project Management @ 3

Details

NVIDIA is seeking a strong technical architect to own end-to-end manageability architecture for next-generation, rack-level AI supercomputing platforms deployed in data centers. You will work with internal and external component leads, drive customer use cases, align architecture with customer requirements, and release high-quality products to market.

Responsibilities

  • Drive server management for large clusters and data centers deploying NVIDIA GPUs and Grace solutions.
  • Work with data center architects and cloud customers to define requirements for implementation to ensure rapid product development.
  • Collaborate closely with hardware teams to define low-level requirements and architecture for data center management products.
  • Own and deliver firmware for low-level management components and manage a team to deliver firmware with quality.
  • Work with internal teams to ensure requirements are designed and implemented correctly across firmware and software modules.
  • Collaborate with other leads to design and build data center health management workflows.
  • Drive reliability and optimization in firmware architecture from a data center viewpoint.
  • Work closely with cluster bring-up teams to resolve issues quickly and own delivered firmware in terms of quality, reliability, and telemetry performance.

Requirements

  • 10+ years of relevant experience working on server firmware (BMC) and platform software development.
  • BS, MS, or PhD in Electrical Engineering, Computer Science, or a related field, or equivalent experience.
  • Hands-on experience with data center health management workflows and a proven record of delivering server firmware for large data centers.
  • Strong knowledge of data center management, server architecture, and server manageability in data centers.
  • 4+ years of proven experience managing teams of engineers.
  • Strong and demonstrable skills in C/C++ and Python; experience programming and debugging server platforms.
  • Experience with SCM systems (e.g., Git, Perforce) and project management tools like Jira.
  • Excellent written and oral communication skills, strong work ethic, teamwork orientation, and commitment to delivering quality work.
  • Self-starter who enjoys finding creative solutions to complicated problems and is hands-on with coding.

Ways to Stand Out

  • Hands-on experience with data center health management and server manageability.
  • Proven technical leadership driving large, complex problems with 25+ engineers.

Compensation & Benefits

  • The base salary range is 224,000 USD - 356,500 USD for Level 3, and 272,000 USD - 425,500 USD for Level 4. The final base salary will be determined based on location, experience, and comparable pay for similar positions.
  • You will also be eligible for equity and company benefits.

Other Information

  • Applications for this job will be accepted at least until August 13, 2025.
  • NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.