Software Engineering Manager, AI Infrastructure Services - DGX Cloud

at Nvidia
USD 200,000-322,000 per year
MIDDLE
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 3 Kubernetes @ 3 DevOps @ 3 Mathematics @ 3 OpenStack @ 3 SRE @ 3 Planning @ 3

Details

We are seeking an experienced manager of Software Engineers to develop automation that runs reliable AI infrastructure services at scale β€” both close to the bare metal and over VMaaS. You will build and lead one or more teams to ensure internal and external cloud services running on accelerated-computing hardware meet reliability and operational expectations.

Responsibilities

  • Recruit and retain talent; manage career development for your organization.
  • Be accountable for deliverables of team(s) in scope.
  • Lead cross-team and cross-company communications.
  • Participate in KPI-driven strategic planning.
  • Foster a collaborative environment.

Requirements

  • 7+ years overall professional experience.
  • BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics) or equivalent experience.
  • 3+ years of management experience with prior hands-on experience as an individual contributor.
  • Proven track record of impactful project deliveries while managing Software Engineers focused on cloud infrastructure or cloud application services.
  • Experience with DevOps practices, Site Reliability Engineering (SRE) practices, and/or Platform Engineering.
  • Experience developing automation for infrastructure and operating reliable services at scale (close to bare metal and over VMaaS).
  • Systematic problem-solving approach, strong communications skills, sense of ownership and drive.

Nice to have

  • Experience developing ML/AI infrastructure or multi-cloud infrastructure services.
  • Experience with bare metal as a service (BMaaS) systems.
  • Teaching or running reliability practices (SRE/CRE) or cloud systems best practices.
  • Experience running private or public cloud systems based on Kubernetes, OpenStack, NVIDIA BCM, Docker or Slurm.

Compensation & Benefits

  • Base salary range: 200,000 USD - 322,000 USD (determined based on location, experience, and internal pay equity).
  • Eligible for equity and company benefits (see NVIDIA benefits).

Additional Information

  • Applications accepted at least until August 13, 2025.
  • NVIDIA is an equal opportunity employer committed to diversity and inclusion.