Senior Cloud Platform Software Engineer
at Nvidia
π Seattle, United States
USD 224,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Kubernetes @ 4 IaC @ 4 Networking @ 4 GPU @ 4Details
NVIDIA is building a cloud offering for AI workloads (DGX Cloud) and seeks a Cloud Platform Engineer to drive technical design and build foundational elements of high-performing cloud services for AI and high-performance computing. This is an opportunity to be a founding member of a team working at the intersection of scalable, fault-tolerant cloud services and AI.
Responsibilities
- Build and design platforms for DGX Cloud services as part of the service team.
- Combine HPC and Kubernetes best practices to help create a unified platform.
- Collaborate with software engineers, product teams, and engineering teams across NVIDIA on DGX Cloud AI Compute services.
- Write Infrastructure as Code (IaC), work on Kubernetes, and help design and implement release pipelines.
- Collaborate on using GitOps and Pipelines effectively.
Requirements
- BS in Computer Science, Information Systems, Computer Engineering, or equivalent experience.
- Solid technical foundation in distributed computing and storage, including substantial experience with server systems, storage, I/O, networking, and system software.
- 12+ years of platform engineering experience on large-scale production systems.
- Kubernetes and IaC expertise as an engineer (experience with Kubernetes concepts such as Pod Disruption Budgets is called out).
- Ability to understand and communicate complex designs, distributed infrastructure, and requirements to peers, customers, and vendors.
- General shared storage knowledge such as NFS, LustreFS, GlusterFS.
- Familiarity with system-level architecture, such as interconnects, memory hierarchy, interrupts, and memory-mapped I/O.
Ways to stand out / Preferred
- Proven experience in high-performance computing (HPC), Deep Learning, and/or GPU-accelerated computing domains.
- Large-scale distributed system, HPC, ML and training experience with Slurm and Kubernetes.
- Deep knowledge of both software and hardware in HPC and ML infrastructure.
Compensation & Benefits
- Base salary range: 224,000 USD - 356,500 USD (will be determined based on location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits (link to NVIDIA benefits provided in original posting).
Additional Information
- Applications accepted at least until September 22, 2025.
- NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.