Lead Software Engineer, Fleet Management - DGX Cloud

at Nvidia
USD 224,000-425,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 4 Go @ 6 Kubernetes @ 4 Linux @ 3 Python @ 6 GCP @ 4 Java @ 6 Distributed Systems @ 4 AWS @ 4 Azure @ 4 Communication @ 4 JavaScript @ 3 PostgreSQL @ 4 Next.js @ 3 React @ 3 Angular @ 3 Debugging @ 7 API @ 4 GPU @ 4

Details

Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work.

We are looking for a Lead Software Engineer to join our DGX Cloud team and build the foundational systems that drive NVIDIA’s high-performance GPU infrastructure. You will play a technical lead role in designing scalable cloud services that integrate with diverse systems including GPU telemetry in datacenters, and enabling operational automation across global cloud operations.

Responsibilities

  • Act as technical lead for a team of software engineers designing cloud services backed by databases and data warehouses.
  • Design and develop RESTful APIs to ingest telemetry from AI datacenters.
  • Build scalable cloud services for high-volume ingestion, processing, and storage of large datasets.
  • Build and manage data pipelines for online and offline data storage.
  • Collaborate across teams to codify business processes into scalable, self-measuring systems.
  • Optimize the reliability and efficiency of cloud services and operations.
  • Lead and ship impactful technical projects, ensuring quality and scalability at every stage.

Requirements

  • At least 12+ years of industry experience with a Bachelor’s or Master’s degree (or equivalent experience); PhD preferred.
  • Expertise in building scalable REST APIs backed by PostgreSQL-compatible data stores.
  • Proficiency in programming languages such as Go, Java, or Python.
  • Familiarity with modern JavaScript frameworks (for example React, Angular, Next.js).
  • Expertise in cloud infrastructure (AWS, GCP, Azure, etc.) and container technologies like Docker and Kubernetes.
  • Expertise with high-scale distributed systems, including architectural patterns for APIs and data pipelines.
  • Outstanding communication and collaboration skills, with a focus on solving complex operational challenges.
  • A passion for delivering scalable and efficient cloud services.
  • Familiarity with Linux operating systems.

Ways to stand out (desirable):

  • A track record of leading engineers to successful delivery and operations of high-performance cloud services at Internet scale.
  • Experience operating NVIDIA datacenter GPUs.
  • Strong debugging and problem-solving skills in distributed environments.

Compensation & Application

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

  • Level 5 base salary range: 224,000 USD - 356,500 USD
  • Level 6 base salary range: 272,000 USD - 425,500 USD

You will also be eligible for equity and benefits. Applications for this job will be accepted at least until October 5, 2025.

Other

NVIDIA is committed to creating an environment where diverse perspectives drive innovation. NVIDIA is an equal opportunity employer and does not discriminate on the basis of protected characteristics.