Principal Infrastructure SW Engineer, AI Cloud Services

at Nvidia

📍 Santa Clara, United States

$272,000-419,800 per year

SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Go @ 6 DevOps @ 6 TypeScript @ 6 Python @ 6 GCP @ 4 AWS @ 4 Azure @ 4 Performance Optimization @ 7 API @ 4

Details

We are now looking for a Principal Software Engineer, AI Cloud Services Infrastructure:

NVIDIA's Deep Learning Libraries Group is seeking an experienced software engineering leader to accelerate our efforts to bring our world-leading AI optimization technologies to bear as cloud services. In this cross-functional role, your mission will be to empower developers across the world to create AI applications that easily use NVIDIA hardware to its fullest through cloud APIs like TensorRT Cloud and NVIDIA AI Foundation Model Endpoints. Your work will focus on the foundational layers needed to consistently deliver services that remain scalable, reliable, secure, while rapidly evolving; and your impact will span the full breadth of NVIDIA’s hardware products, from Drive AGX for autonomous vehicles to DGX servers for datacenter. Join our technically diverse team of software engineers and infrastructure experts to expand the accessibility and reach of NVIDIA’s world-leading AI platforms.

Responsibilities

  • Guide development and operations of cloud services that enable external developers to easily access the latest AI models, optimizations, and serving techniques.
  • Lead and directly contribute to implementation of key infrastructure features to enable product goals and improve productivity of internal engineers.
  • Mentor engineers to develop their technical skills and ability to make an impact.
  • Collaborate with product and engineering leads on feature roadmaps and execution planning.
  • Promote and support methodologies that improve efficiency, product quality, security, and scalability.
  • Identify and seize opportunities to build common infrastructure that can be shared across various AI-related services.

Requirements

  • MS, or PhD in Computer Science, Computer Engineering, or closely related field (or Bachelors with additional equivalent experience).
  • 12+ years of relevant experience as a developer, technical lead, and/or engineering manager.
  • Proven technical skills in architecting, designing, implementing and delivering high-quality cloud services.
  • Proficiency in one or more programming languages (e.g., Python, TypeScript, Go).
  • Proficiency in SW development and DevOps best practices (SW development life cycle, developer workflows, continuous integration, infrastructure as code, etc.).
  • Experience building applications or services that incorporate AI.
  • Excellent interpersonal skills and a collaborative, pragmatic approach to solving problems.

Ways to stand out from the crowd:

  • Experience building and operating publicly accessible services that incorporate AI at scale.
  • Strong grasp of the latest trends in AI inference serving and performance optimization.
  • Deep knowledge of GPU infrastructure management and/or CUDA applications.
  • Experience with multiple major cloud platforms (AWS, Azure, GCP, OCI, etc.).

This is an opportunity to have a wide impact at NVIDIA by expanding our platform and improving development velocity for our unparalleled ecosystem of AI developers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative, driven, and love a challenge, come join our team!