Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
System Administration @ 4
Ansible @ 4
Docker @ 4
Jenkins @ 4
Kubernetes @ 4
Linux @ 4
DevOps @ 4
Python @ 7
GitHub @ 4
GitHub Actions @ 4
CI/CD @ 4
TensorFlow @ 3
Hiring @ 4
Communication @ 4
Jira @ 4
Debugging @ 7
LLM @ 4
PyTorch @ 4
CUDA @ 4
GPU @ 4
Deep Learning @ 4
AI @ 4
GenAI @ 4
Slurm @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA is hiring a build and continuous integration (CI/CD) engineer to join the GenAI Frameworks team (Megatron-LM and NeMo Framework). These are open-source, scalable, cloud-native frameworks for Large Language Models (LLM), multimodal, and video generation workloads. The role focuses on enabling framework engineers, deep learning algorithm engineers, and research scientists to deliver high-quality, high-performance software by developing and maintaining CI/CD, build/release processes, automation, and cluster/ infrastructure tooling.
Responsibilities
- Develop and maintain continuous integration pipelines and release processes for Megatron-LM and NeMo Framework.
- Implement efficient, scalable DevOps solutions to enable more frequent, high-quality releases while maintaining performance.
- Work with industry-standard tools in hybrid on-premise and cloud environments: Kubernetes, Docker, Slurm, Ansible, GitLab, GitHub Actions, Jenkins, Artifactory, Jira.
- Assist with cluster operations and system administration (managing servers, team accounts, clusters).
- Automate recurring tasks to accelerate R&D cycles, such as accuracy and performance regression detection.
- Develop quality control measures (code analysis, backwards compatibility, regression testing) and advance best practices.
- Collaborate closely with teams working on DL frameworks and libraries (CUDA, cuDNN, cuBLAS, PyTorch) and other NVIDIA engineering teams providing software, testing, and release infrastructure.
Requirements
- BS or MS in Computer Science, Computer Architecture or related technical field (or equivalent experience).
- 3+ years of industry experience in DevOps and infrastructure engineering.
- Strong system-level programming skills in Python and shell scripting.
- Experience with build/release systems and CI/CD (GitLab, GitHub, Jenkins, etc.).
- Experience with Linux system administration.
- Experience with containerization and cluster management (Docker, Kubernetes).
- Experience with build tools including Make and CMake.
- Strong background in source code management (GitLab, GitHub, Perforce, etc.).
- Strong problem-solving and debugging skills.
- Good collaboration, interpersonal, and written communication skills.
Ways to stand out
- Proven track record with GPU-accelerated systems at scale.
- Familiarity with deep learning frameworks such as PyTorch, JAX, or TensorFlow.
- Expertise in cluster and cloud compute technologies (e.g., SLURM, Lustre, Kubernetes).
- Experience in software and hardware benchmarking on high-performance computing systems.
Compensation and benefits
- Base salary range: 152,000 USD - 241,500 USD (determined based on location, experience, and pay of employees in similar positions).
- Eligible for equity and company benefits (link to NVIDIA benefits referenced in posting).
Additional information
- Applications will be accepted at least until February 23, 2026.
- This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and states non-discrimination across a range of protected characteristics.