Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 4 Docker @ 4 Go @ 4 Grafana @ 4 Jenkins @ 4 Kubernetes @ 4 Linux @ 7 Prometheus @ 4 Python @ 4 GitHub @ 4 CI/CD @ 4 Helm @ 4 React @ 4 Node.js @ 4 Rust @ 4 Debugging @ 7 API @ 4 CUDA @ 3 GPU @ 4Details
NVIDIA is seeking experienced software engineers to expand enterprise GPU management and monitoring tools. The role focuses on designing and building cloud-native management agents, Kubernetes integrations, and end-to-end integration solutions that combine GPUs with the datacenter software management ecosystem. The work spans telemetry and metrics, health checks, diagnostics, configuration, and system management for single-node developer systems through large clusters.
Responsibilities
- Develop and maintain distributed, robust, and scalable Go programs deployed to Kubernetes environments that manage large datacenters.
- Develop and maintain user-space applications, containers, Go-bindings, and CLI tools.
- Enable GPU management integration with the open-source ecosystem, including Kubernetes and Docker.
- Support internal and external users through bug fixes, documentation, and feature improvements.
- Maintain high-quality products through robust test coverage.
Requirements
- BS or higher in Computer Science or equivalent experience.
- 5+ years of meaningful industry experience with a strong Go and Kubernetes development background.
- Strong Linux background with user-space development and debugging expertise.
- Experience with APIs and interface design.
- Outstanding written and verbal interpersonal skills; business-level English.
- Strong motivation and commitment to learn new skills.
- Ability to execute all aspects of the software development lifecycle and manage time in a fast, heavily multitasked environment.
- Development experience with one or more of: Rust, Python, C, C++.
- Experience developing and maintaining enterprise software; deploying, managing, and debugging applications in Kubernetes environments.
Nice to have / Ways to stand out
- Background with containers (e.g., Docker, OCI), orchestration frameworks, and logging/telemetry backends.
- Experience with Kubernetes monitoring stacks and tools such as Prometheus, Loki, and Grafana.
- Experience with modern UI development in React and Node.js or similar frameworks.
- Experience developing Kubernetes operators or Helm charts.
- Experience with HPC job schedulers like Slurm or Run.AI.
- Familiarity with Kubernetes internals and exposure to GPU programming with CUDA.
- Experience with Jenkins and GitHub/GitLab CI/CD pipelines.
Compensation & Benefits
- Base salary ranges by level:
- Level 3: 148,000 USD - 235,750 USD
- Level 4: 184,000 USD - 287,500 USD
- You will also be eligible for equity and benefits: https://www.nvidia.com/en-us/benefits/
- Applications accepted at least until August 11, 2025.
Equal Opportunity
NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. They do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.