Senior Software Engineer, AI Platform - Robotics

at Nvidia
USD 148,000-287,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Go @ 4 Grafana @ 3 Kubernetes @ 4 Prometheus @ 3 DevOps @ 4 Python @ 4 CI/CD @ 3 Azure @ 4 Helm @ 4 Networking @ 4 Microservices @ 4 Debugging @ 4 GPU @ 4

Details

We’re building the infrastructure that powers GR00T, NVIDIA’s general-purpose humanoid robotics platform. This is not a typical DevOps job. You’ll help engineer the cloud-native backend that drives simulation, synthetic data generation, multi-stage model training, and robotic deployment—all at massive scale. Our orchestration system, NVIDIA OSMO, is built to handle real-time robotics workflows in cloud environments across thousands of GPUs. We’re looking for a pragmatic Kubernetes-native backend and infrastructure engineer who excels in solving complex orchestration problems in distributed AI/ML systems.

Responsibilities

  • Architect, develop, and deploy backend services supporting NVIDIA GR00T using Kubernetes and cloud-native technologies.
  • Collaborate with ML, simulation, and robotics engineers to deploy scalable, reproducible, and observable multi-node training and inference workflows.
  • Extend and maintain OSMO’s orchestration layers to support heterogeneous compute backends and robotic data pipelines.
  • Develop Helm charts, controllers, CRDs, and service mesh integrations to support secure and fault-tolerant system operation.
  • Implement microservices written in Go or Python that power GR00T task execution, metadata tracking, and artifact delivery.
  • Optimize job scheduling, storage access, and networking across hybrid and multi-cloud Kubernetes environments (e.g., OCI, Azure, on-prem).
  • Build tooling that simplifies deployment, debugging, and scaling of robotics workloads.

Requirements

  • BS, MS, or PhD degree in Computer Science, Electrical Engineering, Computer Engineering, or related field (or equivalent experience).
  • 5+ years of work experience in DevOps, backend, or cloud infrastructure engineering.
  • Hands-on experience building and deploying microservices in Kubernetes-native environments.
  • Proficiency in Golang or Python, especially for backend systems and operators.
  • Experience with Helm, or other Kubernetes templating and config management tools.
  • Familiarity with GitOps workflows, observability stacks (e.g., Prometheus, Grafana), and container CI/CD pipelines.
  • Strong understanding of container networking, storage (e.g., PVCs, ephemeral), and scheduling.

Ways to stand out from the crowd

  • Experience with ML training workflows, distributed job orchestration (e.g., MPI, Ray, Triton Inference Server).
  • Knowledge of robotics frameworks (e.g., ROS2) or simulation tools (e.g., Isaac Sim, Omniverse).
  • Background with GPU cluster management and scheduling across cloud providers.
  • Contributions to open-source Kubernetes projects or custom operators/controllers.

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you are creative and autonomous, we want to hear from you!