Software Engineering Manager, AI Infrastructure Services - DGX Cloud
at Nvidia
π Santa Clara, United States
USD 200,000-322,000 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Docker @ 3 Kubernetes @ 3 DevOps @ 3 Mathematics @ 3 OpenStack @ 3 SRE @ 3 Planning @ 3Details
We are seeking an experienced manager of Software Engineers to develop automation that runs reliable AI infrastructure services at scale β both close to the bare metal and over VMaaS. You will build and lead one or more teams to ensure internal and external cloud services running on accelerated-computing hardware meet reliability and operational expectations.
Responsibilities
- Recruit and retain talent; manage career development for your organization.
- Be accountable for deliverables of team(s) in scope.
- Lead cross-team and cross-company communications.
- Participate in KPI-driven strategic planning.
- Foster a collaborative environment.
Requirements
- 7+ years overall professional experience.
- BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics) or equivalent experience.
- 3+ years of management experience with prior hands-on experience as an individual contributor.
- Proven track record of impactful project deliveries while managing Software Engineers focused on cloud infrastructure or cloud application services.
- Experience with DevOps practices, Site Reliability Engineering (SRE) practices, and/or Platform Engineering.
- Experience developing automation for infrastructure and operating reliable services at scale (close to bare metal and over VMaaS).
- Systematic problem-solving approach, strong communications skills, sense of ownership and drive.
Nice to have
- Experience developing ML/AI infrastructure or multi-cloud infrastructure services.
- Experience with bare metal as a service (BMaaS) systems.
- Teaching or running reliability practices (SRE/CRE) or cloud systems best practices.
- Experience running private or public cloud systems based on Kubernetes, OpenStack, NVIDIA BCM, Docker or Slurm.
Compensation & Benefits
- Base salary range: 200,000 USD - 322,000 USD (determined based on location, experience, and internal pay equity).
- Eligible for equity and company benefits (see NVIDIA benefits).
Additional Information
- Applications accepted at least until August 13, 2025.
- NVIDIA is an equal opportunity employer committed to diversity and inclusion.