Principal Full Stack Software Engineer

at Nvidia
USD 272,000-425,500 per year
SENIOR
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 4 Docker @ 4 Kubernetes @ 4 Linux @ 4 Python @ 7 Airflow @ 4 Distributed Systems @ 8 Machine Learning @ 4 JavaScript @ 6 CSS @ 6 KubeFlow @ 4 Rust @ 7 API @ 6 GPU @ 4

Details

NVIDIA is at the forefront of innovations in Artificial Intelligence, High-Performance Computing, and Visualization. The GPU functions as the visual cortex of modern computing and is central to applications from generative AI to autonomous vehicles. This Principal Full Stack Software Engineer role will help accelerate the next era of machine learning innovation by designing and implementing engineering solutions that deliver functional, reliable, secure, and performance-optimal GPU clusters to internal researchers. The work empowers scientists and engineers to train, fine-tune, and deploy advanced ML models on some of the world’s most powerful GPU systems.

Responsibilities

  • Work across the Managed AI Research Supercluster organization to understand pain points in validating, monitoring, and operating GPU clusters at scale.
  • Design, develop, and maintain engineering solutions to systematically solve operational pain points.
  • Research and apply traditional AIOps and emerging Agentic AI techniques to reduce operational toil.
  • Participate in on-call support for systems and platforms built and owned by the team.
  • Enable self-service continuous improvement on reliability, operational excellence, and performance for internal researchers.

Requirements

  • BS/MS in Computer Science, Engineering, or equivalent experience.
  • 15+ years in software/platform engineering, including 3+ years in ML infrastructure or distributed systems.
  • Proficiency with full-stack development: relational data modeling, database optimization, REST API semantics, JavaScript, CSS, and providing APIs as a service.
  • Experience in software development lifecycle on Linux-based platforms.
  • Strong coding skills in languages such as Python, C++ or Rust.
  • Experience with AIOps or Agentic AI and applying it successfully in production environments.
  • Experience with Docker, Kubernetes, GitLab CI, and automated deployments.
  • Ability to participate in on-call rotation and support production systems.

Ways To Stand Out

  • Familiarity with GPU computing, Linux systems internals, and performance tuning at scale.
  • Experience running Slurm or custom scheduling frameworks in production ML environments.
  • Experience with ML orchestration tools such as Kubeflow, Flyte, Airflow, or Ray.

Benefits & Compensation

  • Base salary range (determined by location, experience, and pay of employees in similar positions): 272,000 USD - 425,500 USD.
  • Eligible for equity and benefits (see company benefits page).

Additional Information

  • Applications for this job will be accepted at least until October 25, 2025.
  • NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.