Principal Cloud Engineer, HPC
at Nvidia
π Seattle, United States
USD 272,000-425,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Kubernetes @ 4 Distributed Systems @ 7 Networking @ 7 GPU @ 4Details
NVIDIA is on a mission to create the best cloud platform for AI workloads by delivering its advanced GPU technology through managed services under the DGX Cloud umbrella. This role offers the opportunity to be a founding member of a team building scalable, fault-tolerant cloud services combined with AI.
Responsibilities
- Design and architect AI-oriented compute services for large training workloads.
- Build distributed computing infrastructure and training services for large scale distributed model training.
- Plan and coordinate infrastructure build-outs with multifunctional teams, partners, and vendors.
- Collaborate across NVIDIA engineering teams to translate requirements into infrastructure needs.
Requirements
- Strong foundation in distributed computing and storage, including experience with server systems, storage, I/O, networking, and system software.
- Bachelor's degree or equivalent experience.
- 12+ years of system software engineering experience on large-scale production systems.
- 12+ years architecting high-performance computing infrastructure at scale.
- Proven experience in HPC, Deep Learning, and/or GPU accelerated computing domains.
- Ability to communicate complex designs and distributed infrastructure to peers, customers, and vendors.
- Knowledge of shared storage systems such as NFS, LustreFS, GlusterFS.
- Familiarity with system-level architecture including interconnects, memory hierarchy, interrupts, and memory-mapped IO.
Ways to Stand Out
- Experience with large-scale distributed systems, HPC, ML, Training using Slurm and Kubernetes.
- Deep software and hardware knowledge in HPC and ML infrastructure.
About NVIDIA
NVIDIA leads in AI, High-Performance Computing, and Visualization technology, powering innovations from AI to autonomous vehicles. This role will contribute to advancing these frontiers.
Compensation & Benefits
The base salary ranges from 272,000 USD to 425,500 USD depending on location, experience, and comparable positions. Equity and benefits are also included. NVIDIA fosters a diverse, equal opportunity workplace.