Senior Full Stack Software Engineer - DGX Cloud

at Nvidia
πŸ“ World
πŸ“ United States
USD 224,000-356,500 per year
SENIOR
βœ… Remote

Used Tools & Technologies

Go LLM

Required Skills & Competences

Kubernetes @ 4 TypeScript @ 4 SQL @ 6 Distributed Systems @ 4 Hiring @ 4 Communication @ 7 JavaScript @ 6 PostgreSQL @ 4 React @ 4 GPU @ 4 AI @ 4 Slurm @ 7

Details

NVIDIA is hiring experienced software engineers to help scale up its AI infrastructure. You will help advance NVIDIA's capacity to build and deploy leading infrastructure solutions for a broad range of AI-based applications. Expect to be challenged, to improve, and to evolve. If you are creative, passionate about GPUs, and enjoy working on large-scale systems, please apply.

Responsibilities

  • Be part of the DGX Cloud team responsible for production systems that enable large scalable GPU clusters for a variety of AI workloads.
  • Design and develop a massively distributed, scalable platform used to identify, diagnose, and remediate non-performant GPU assets.
  • Work with teams across NVIDIA to ensure production AI clusters run reliably and consistently with maximum performance.
  • Evaluate system failures and improve services based on a defined incident management process.
  • Work across the product stack including frontend and backend technologies: React, Web Components, TypeScript, Golang, PostgreSQL, Temporal, Bazel, Kubernetes.

Requirements

  • Significant software engineering experience within a highly technical organization with demonstrable impact.
  • Strong communication skills and ability to work successfully with cross-functional teams, principals, and architects across organizational boundaries and geographies.
  • 12+ years in a similar role with experience on large-scale production systems.
  • BS in Computer Science or Engineering or equivalent experience.
  • 6+ years of full-stack engineering experience.
  • 3+ years building and shipping consumer-facing products.
  • Proficiency in React, TypeScript/JavaScript, and Golang.
  • Proficiency with a SQL database (PostgreSQL mentioned).

Ways to stand out

  • Technical competency in managing and automating large-scale distributed systems independent of cloud providers. Advanced hands-on experience and deep understanding of cluster management systems (Kubernetes, Slurm, Base Command Manager).
  • Empathy for users, attention to detail, and passion for creating world-class user experiences.
  • Prior experience in asynchronous workflows and/or event-driven architecture.
  • Proven operational excellence in maintaining reliable and performant infrastructure.
  • A good understanding of how to use LLMs responsibly and the perils of blindly consuming their output.

Compensation & Benefits

  • Base salary range: 224,000 USD - 356,500 USD (determined based on location, experience, and pay of employees in similar positions).
  • Eligible for equity and company benefits (link to NVIDIA benefits in original posting).

Other details

  • Applications for this job will be accepted at least until May 18, 2026.
  • NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer.