Senior Cloud Platform Software Engineer

at Nvidia
USD 224,000-356,500 per year
SENIOR
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Kubernetes @ 4 IaC @ 4 CI/CD @ 4 Distributed Systems @ 4 Networking @ 4 API @ 4 GPU @ 4

Details

NVIDIA is building cloud offerings for AI workloads and managed services under the DGX Cloud umbrella. The team is creating scalable, fault-tolerant cloud services and self-service APIs to deliver NVIDIA GPU technology to customers. This role is for a founding member of a team designing and building foundational elements of high-performing cloud services for AI and HPC.

Responsibilities

  • Build and design platforms for DGX Cloud services as part of a service team.
  • Unify best practices from HPC and Kubernetes to create a single platform.
  • Collaborate with software engineers, product teams, and engineering teams across NVIDIA on DGX Cloud AI compute services.
  • Write Infrastructure as Code (IaC), work on Kubernetes, and help design and implement release pipelines.
  • Collaborate to adopt and optimize GitOps and CI/CD pipelines.

Requirements

  • BS in Computer Science, Information Systems, Computer Engineering, or equivalent experience.
  • Solid technical foundation in distributed computing and storage, including substantial experience with server systems, storage, I/O, networking, and system software.
  • 12+ years of platform engineering experience on large-scale production systems.
  • Expertise in Kubernetes (including concepts such as PodDisruptionBudget) and Infrastructure as Code (IaC) as an engineer.
  • Ability to understand and communicate complex designs, distributed infrastructure, and requirements to peers, customers, and vendors.
  • General shared storage knowledge such as NFS, LustreFS, GlusterFS.
  • Familiarity with system-level architecture concepts such as interconnects, memory hierarchy, interrupts, and memory-mapped I/O.

Ways to Stand Out

  • Proven experience in high-performance computing (HPC), deep learning, and/or GPU-accelerated computing domains.
  • Experience with large-scale distributed systems, HPC, ML training workflows using Slurm and Kubernetes.
  • Deep knowledge of both software and hardware aspects of HPC and ML infrastructure.

About NVIDIA

NVIDIA leads developments in Artificial Intelligence, High-Performance Computing, and Visualization. The company emphasizes innovation using GPUs to enable AI, autonomous vehicles, and advanced visualization.

Compensation & Benefits

  • Base salary range: 224,000 USD - 356,500 USD (determined by location, experience, and internal pay equity).
  • Eligibility for equity and company benefits.

Application & Other

  • Applications for this job will be accepted at least until September 22, 2025.
  • NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.