AI K8s Infrastructure Generalist

at Nvidia

📍 Santa Clara, United States

$180,000-339,200 per year

SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Go @ 4 Kubernetes @ 4 Linux @ 6 Python @ 4 Distributed Systems @ 4 Networking @ 7 SRE @ 4 API @ 6

Details

We are looking for a highly motivated k8s AI infrastructure generalist to join our team in one of the most expert k8s organizations. There is an excellent opportunity to architect and drive advancements in the SRE automation on the largest NVIDIA GPU clusters in the cloud! Please apply if you are passionate about Kubernetes, building k8s infrastructure automation and deployment tools, and working on new technologies and Cloud Native applications.

Responsibilities

  • As part of the Maglev AI infrastructure and SRE team, you will propose and craft new ways to improve the availability of our Cloud Native AI Platform by automating critical processes on the multiple distributed GPU clusters.
  • The solutions you propose and build will directly impact the efficiency of the NVIDIA Autonomous Vehicles Perception development team!

Requirements

  • BS or MS in the CS/CE/EE or equivalent experience.
  • At least 6+ years of k8s experience on-prem and in the cloud.
  • At least 4 years building automation software APIs for large scale computing clusters and data platforms.
  • Ability to help our team develop a better stack of Go and Python automation APIs.
  • Complete understanding of Kubernetes and Cloud Native Architecture and working experience with k8s clusters.
  • Expertise at problem solving and complexity analysis of distributed systems.
  • Proficiency with Linux environment.
  • Excellent written and verbal interpersonal skills.
  • A fun and motivated teammate who enjoys a challenge and celebrates success.

Ways to Stand Out from the Crowd

  • Previous experience with building sophisticated tooling and SRE automation on large 100+ nodes GPU/CPU clusters.
  • DevSecOps experience with a good understanding of cloud security concepts.
  • Deep knowledge of networking layers and fundamentals.

For two decades, we have pioneered visual computing, the art and science of computer graphics. With our invention of the GPU - the engine of modern visual computing - the field has expanded to encompass video games, movie production, product design, medical diagnosis, and scientific research. Today, we stand at the beginning of a new AI computing era, ignited by a new computing model, GPU deep learning. This new model - where deep neural networks are trained to recognize patterns from extensive amounts of data - has shown to be deeply effective at solving some of the most adventurous problems in everyday life.