Data Center System Software Architect, DGX Cloud

at Nvidia

πŸ“ Santa Clara, United States

$180,000-339,200 per year

SENIOR
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 4 Go @ 4 Kubernetes @ 4 Linux @ 4 Terraform @ 4 Python @ 4 Distributed Systems @ 3 Machine Learning @ 4 Data Science @ 4 TensorFlow @ 4 Hiring @ 4 Communication @ 3 Parallel Programming @ 4 Rust @ 4 Microservices @ 4 Debugging @ 7 PyTorch @ 4

Details

NVIDIA is hiring engineers to scale up its AI Infrastructure. We expect you to have a strong programming background, a deep understanding of distributed systems, familiarity with software testing and deployment, and excellent communication and planning abilities. We also welcome out-of-the-box thinkers who can provide new ideas with strong execution bias. Expect to be constantly challenged, improving, and evolving for the better.

You and other engineers in this team will help advance NVIDIA's capacity to build and deploy leading infrastructure solutions for a broad range of AI-based applications that affect core data science. What are you waiting for if you're creative, passionate about what you do, and love having fun? Apply today!

Responsibilities

  • Lead technical activities for data centers with focus on hybrid deployments between cloud and on-prem.
  • Provide expertise in infrastructure workflows, including hardware, workload orchestration, and application tuning.
  • Provide fast and creative solutions for complex problems and write effective, clear, and reliable architecture specifications.
  • Translate requirements to vision, architecture, and roadmap.
  • Work with engineering teams across NVIDIA to ensure your software integrates seamlessly from the hardware all the way up to the AI training applications.

Requirements

  • Masters or PhD in Computer Science, Computer Engineering, Physics, or equivalent experience.
  • 10+ years of experience in this field.
  • Data Sciences, Deep Learning, or Machine Learning coursework.
  • Ability to seamlessly shift between Linux system environments to Python programming.
  • Programming skills in 1 or more high-level languages (C, C++, Go, Rust, etc).
  • System-level experience with both hardware and software.
  • Motivated self-starter with an equal balance of strong problem-solving skills and customer-facing communication skills.
  • Strong design, coding, analytical, debugging, and problem-solving skills.
  • Passion for continuous learning and knowledge transfer. Ability to work concurrently with multiple groups locally and abroad in the organization.

Ways to stand out from the crowd

  • Experience with GPU deep learning and data sciences. Experience using TensorFlow, PyTorch, or other DL frameworks. Experience working with Docker containers, Slurm, Terraform, and Kubernetes.
  • CUDA programming and NCCL experience. HPC programming experience including MPI, OpenACC, or other parallel programming tools.
  • Hands-on experience with DGX Cloud, NVIDIA AI Enterprise AI Software, Base Command Manager, NEMO, and NVIDIA Inference Microservices.

NVIDIA is widely considered to be one of the technology world's most desirable employers. We have some of the most forward-thinking and hardworking people on the planet working for us. If you are creative and autonomous, we want to hear from you!