Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 3 Ansible @ 4 Docker @ 4 Go @ 4 Jenkins @ 4 Kubernetes @ 4 Linux @ 4 Python @ 4 GitHub @ 4 GitHub Actions @ 4 CI/CD @ 4 Rust @ 4 Debugging @ 4 API @ 4 OSS @ 4 CUDA @ 4 GPU @ 4Details
NVIDIA is looking for outstanding software engineers to help us expand our enterprise GPU management and monitoring tools. In this role you will work closely with the broader NVIDIA team to design and build Linux-based management agents, CLI tools and end-to-end integration solutions that combine GPUs with the rest of the data center software management ecosystem. You will also help maintain our containerized build environment, build process, CI/CD pipelines and infrastructure, and packaging.
We are focused on supporting NVIDIA products across HPC, cloud and enterprise on both bare metal and virtualized platforms as the role of GPUs in all of these environments expands rapidly. Your contributions will span many aspects of GPU system integration, including telemetry and metrics, health checks, diagnostics, configuration, accounting and policy. These tools fill roles of both passive background monitoring and active online management with a core emphasis on operational transparency and seamless integration in customer environments. Your code will support single node developer systems through large clusters with thousands of nodes. To be successful you will need to have a strong Linux C/C++ background, familiarity with distributed software development and a proven work ethic. You will be expected to jump in quickly and provide important contributions from day one.
Responsibilities
- Develop robust, scalable C++ user space data center management system software under Linux
- Build and maintain user-space libraries, agents, plugins, bindings and CLI tools
- Enable GPU management integration with the OSS ecosystem, including Kubernetes and Docker
- Maintain build and CI/CD processes to deliver our product on CUDA-supported OSes
- Support internal and external users through bug fixes, documentation and feature improvements
- Maintain high quality products through robust test coverage and smart design
Requirements
- BS or higher in Computer Science or equivalent experience
- 5+ years of meaningful industry experience with a strong C++ development background
- User space development and debugging expertise under Linux environments
- Experience packaging software for Linux package managers (DEB and RPM)
- Experience using Kitware utilities to manage builds (CMake, CPack, CTest)
- Experience with APIs and interface design
- Outstanding written and verbal interpersonal skills; strong motivation and commitment to learn new skills
- Ability to execute all aspects of the software development lifecycle and manage time in a fast, heavily multitasked environment
Nice to have / Ways to stand out
- Development experience with Python, Go, and Rust
- Experience developing CI/CD pipelines using GitLab-CI, GitHub Actions, or Jenkins
- Experience developing containerized environments using Docker (buildx, bake, BuildKit)
- Exposure to GPU programming with CUDA
- Experience developing playbooks, roles, and modules for Ansible
- Experience with RESTful web services and CLI tools
Compensation & Additional Info
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4. You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until December 19, 2025.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.