Senior MLOps Engineer, GenAI Framework

at Nvidia

📍 Santa Clara, United States

USD 152,000-241,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

System Administration @ 4 Ansible @ 4 Docker @ 4 Jenkins @ 4 Kubernetes @ 4 Linux @ 4 DevOps @ 4 Python @ 7 GitHub @ 4 GitHub Actions @ 4 CI/CD @ 4 TensorFlow @ 3 Hiring @ 4 Communication @ 4 Jira @ 4 Debugging @ 7 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 4 Deep Learning @ 4 AI @ 4 GenAI @ 4 Slurm @ 4

Details

NVIDIA is hiring a build and continuous integration (CI/CD) engineer to join the GenAI Frameworks team (Megatron-LM and NeMo Framework). These are open-source, scalable, cloud-native frameworks for Large Language Models (LLM), multimodal, and video generation workloads. The role focuses on enabling framework engineers, deep learning algorithm engineers, and research scientists to deliver high-quality, high-performance software by developing and maintaining CI/CD, build/release processes, automation, and cluster/ infrastructure tooling.

Responsibilities

Develop and maintain continuous integration pipelines and release processes for Megatron-LM and NeMo Framework.
Implement efficient, scalable DevOps solutions to enable more frequent, high-quality releases while maintaining performance.
Work with industry-standard tools in hybrid on-premise and cloud environments: Kubernetes, Docker, Slurm, Ansible, GitLab, GitHub Actions, Jenkins, Artifactory, Jira.
Assist with cluster operations and system administration (managing servers, team accounts, clusters).
Automate recurring tasks to accelerate R&D cycles, such as accuracy and performance regression detection.
Develop quality control measures (code analysis, backwards compatibility, regression testing) and advance best practices.
Collaborate closely with teams working on DL frameworks and libraries (CUDA, cuDNN, cuBLAS, PyTorch) and other NVIDIA engineering teams providing software, testing, and release infrastructure.

Requirements

BS or MS in Computer Science, Computer Architecture or related technical field (or equivalent experience).
3+ years of industry experience in DevOps and infrastructure engineering.
Strong system-level programming skills in Python and shell scripting.
Experience with build/release systems and CI/CD (GitLab, GitHub, Jenkins, etc.).
Experience with Linux system administration.
Experience with containerization and cluster management (Docker, Kubernetes).
Experience with build tools including Make and CMake.
Strong background in source code management (GitLab, GitHub, Perforce, etc.).
Strong problem-solving and debugging skills.
Good collaboration, interpersonal, and written communication skills.

Ways to stand out

Proven track record with GPU-accelerated systems at scale.
Familiarity with deep learning frameworks such as PyTorch, JAX, or TensorFlow.
Expertise in cluster and cloud compute technologies (e.g., SLURM, Lustre, Kubernetes).
Experience in software and hardware benchmarking on high-performance computing systems.

Compensation and benefits

Base salary range: 152,000 USD - 241,500 USD (determined based on location, experience, and pay of employees in similar positions).
Eligible for equity and company benefits (link to NVIDIA benefits referenced in posting).

Additional information

Applications will be accepted at least until February 23, 2026.
This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer and states non-discrimination across a range of protected characteristics.