Senior Systems Software Engineer, Containers and Kubernetes

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 7 Docker @ 4 Go @ 4 Kubernetes @ 4 Prometheus @ 4 Terraform @ 4 GitHub @ 7 CI/CD @ 7 SRE @ 7 LLM @ 4 OpenAPI @ 4 CUDA @ 3 GPU @ 4

Details

NVIDIA is seeking experienced software and systems engineers to develop and operate enterprise GPU infrastructure management systems across clouds. The role focuses on designing, building, and operating infrastructure management systems, Kubernetes operators, and end-to-end HPC integration solutions that combine GPUs with datacenter software management ecosystems. Work spans GPU systems management including cloud provisioning, observability, operations, and incident response, supporting single-node developer systems through large clusters across multiple cloud providers.

Responsibilities

Enable GPU provisioning and lifecycle using cloud-native open-source ecosystem solutions, including Kubernetes, Docker, Prometheus, Terraform and Crossplane.
Develop, maintain, and operate robust, scalable Go programs in a Kubernetes environment.
Design and build next-generation multi-cloud infrastructure management systems to support GenAI.
Support internal and external users through bug fixes, documentation, and feature improvements.
Maintain high-quality products with robust test coverage and Day 2 operational capabilities.
Participate in cloud provisioning, observability, operations, and incident response for large-scale GPU deployments.

Requirements

BS or higher in Computer Science or equivalent experience.
8+ years of meaningful industry experience with a strong Kubernetes and SRE background.
Deep understanding and execution skills across the software development lifecycle (SDLC).
Experience with OpenAPI and Kubernetes Custom Resource Definitions (CRDs).
Experience developing Kubernetes operators and working with Kubernetes internals.
Experience developing and operating services written in Go.
Familiarity with container technologies and orchestration frameworks.
Business-level English with strong written and verbal interpersonal skills.
Strong motivation, commitment to continuous learning, and ability to manage time in a fast, heavily multitasked environment.

Preferred / Ways to stand out

Open-source contributions to the cloud-native community and understanding of AI and LLM principles.
Strong experience with GitHub/GitLab CI/CD pipelines and application configuration.
Strong knowledge of container technologies, orchestration frameworks and observability systems (e.g., Prometheus).
Exposure to GPU programming with CUDA and familiarity with Kubernetes internals.
Experience developing Kubernetes operators.
Experience managing and operating HPC schedulers and/or working across multiple cloud providers.

Compensation & Benefits

Base salary ranges by level:
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
Eligible for equity and company benefits (see NVIDIA benefits page).

Additional information

Location: Santa Clara, California, United States (Full time).
Applications accepted at least until October 10, 2025.
NVIDIA is an equal opportunity employer committed to diversity and inclusion.