Senior Software Engineer, AI Platform - Robotics

at Nvidia

📍 Santa Clara, United States

USD 148,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Go @ 4 Grafana @ 3 Kubernetes @ 4 Prometheus @ 3 DevOps @ 4 Python @ 4 CI/CD @ 3 Azure @ 4 Helm @ 4 Networking @ 4 Microservices @ 4 Debugging @ 4 GPU @ 4

Details

We’re building the infrastructure that powers GR00T, NVIDIA’s general-purpose humanoid robotics platform. This is not a typical DevOps job. You’ll help engineer the cloud-native backend that drives simulation, synthetic data generation, multi-stage model training, and robotic deployment—all at massive scale. Our orchestration system, NVIDIA OSMO, is built to handle real-time robotics workflows in cloud environments across thousands of GPUs. We’re looking for a pragmatic Kubernetes-native backend and infrastructure engineer who excels in solving complex orchestration problems in distributed AI/ML systems.

Responsibilities

Architect, develop, and deploy backend services supporting NVIDIA GR00T using Kubernetes and cloud-native technologies.
Collaborate with ML, simulation, and robotics engineers to deploy scalable, reproducible, and observable multi-node training and inference workflows.
Extend and maintain OSMO’s orchestration layers to support heterogeneous compute backends and robotic data pipelines.
Develop Helm charts, controllers, CRDs, and service mesh integrations to support secure and fault-tolerant system operation.
Implement microservices written in Go or Python that power GR00T task execution, metadata tracking, and artifact delivery.
Optimize job scheduling, storage access, and networking across hybrid and multi-cloud Kubernetes environments (e.g., OCI, Azure, on-prem).
Build tooling that simplifies deployment, debugging, and scaling of robotics workloads.

Requirements

BS, MS, or PhD degree in Computer Science, Electrical Engineering, Computer Engineering, or related field (or equivalent experience).
5+ years of work experience in DevOps, backend, or cloud infrastructure engineering.
Hands-on experience building and deploying microservices in Kubernetes-native environments.
Proficiency in Golang or Python, especially for backend systems and operators.
Experience with Helm, or other Kubernetes templating and config management tools.
Familiarity with GitOps workflows, observability stacks (e.g., Prometheus, Grafana), and container CI/CD pipelines.
Strong understanding of container networking, storage (e.g., PVCs, ephemeral), and scheduling.

Ways to stand out from the crowd

Experience with ML training workflows, distributed job orchestration (e.g., MPI, Ray, Triton Inference Server).
Knowledge of robotics frameworks (e.g., ROS2) or simulation tools (e.g., Isaac Sim, Omniverse).
Background with GPU cluster management and scheduling across cloud providers.
Contributions to open-source Kubernetes projects or custom operators/controllers.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you are creative and autonomous, we want to hear from you!