Senior DevOps and Build Systems Engineer

at Nvidia

📍 Santa Clara, United States

USD 144,000-270,200 per year

SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Security @ 4 Ansible @ 4 Docker @ 4 Kubernetes @ 4 DevOps @ 4 Terraform @ 4 Python @ 4 GCP @ 4 GitHub @ 4 GitHub Actions @ 4 CI/CD @ 4 AWS @ 4 Azure @ 4 Bash @ 4 Debugging @ 4 Design Patterns @ 4 OSS @ 4 LLM @ 4 Compliance @ 4 Agile @ 4

Details

We are seeking a Senior DevOps and Build Systems Engineer to join the NVIDIA AI TensorRT-LLM team. You will take ownership of systems that power engineering innovation, covering CI/CD pipelines, build systems, product security, observability, and compliance. The role involves designing and implementing solutions with autonomy and collaborating with cross-functional teams and external partners to drive efficiency and reliability.

Responsibilities

Build and maintain infrastructure from first principles needed to deliver TensorRT LLM.
Maintain and improve CI/CD pipelines to automate build, test, and deployment processes; identify and remove bottlenecks.
Manage tooling and enable automations for redundant manual workflows using GitHub Actions, GitLab, Terraform, etc.
Perform and enable security scans and handle security CVEs for infrastructure components.
Improve modularity of build systems using CMake.
Use AI to help build automated triaging workflows.
Collaborate extensively with cross-functional teams to integrate pipelines from deep learning frameworks and components to ensure seamless deployment and inference of deep learning models on the platform.

Requirements

Master’s degree or equivalent experience.
3+ years of experience in Computer Science, computer architecture, or a related field.
Ability to work in a fast-paced, agile team environment.
Excellent Bash, CI/CD, and Python programming and software design skills, including debugging, performance analysis, and test design.
Experience with CMake.
Background with security best practices for releasing libraries and handling CVEs.
Experience administering, monitoring, and deploying systems and services on GitHub and cloud platforms; supporting other technical teams in monitoring operating efficiencies and responding to needs.
Highly skilled in Kubernetes and Docker/containerd.
Automation expertise with hands-on skills in frameworks like Ansible and Terraform.
Experience with one or more cloud providers (AWS, Azure, or GCP).

Ways to stand out

Experience contributing to large open-source deep learning communities (GitHub, bug tracking, branching/merging, OSS licensing, patches).
Experience defining and leading DevOps strategy (design patterns, reliability, scaling) for a team or organization.
Experience driving efficiencies in software architecture, creating metrics, and implementing infrastructure-as-code and automation improvements.
Deep understanding of test automation infrastructure, frameworks, and test analysis.
Experience troubleshooting and debugging across storage systems, kernels, and containers; familiarity with Triton Inference Server is a plus.

Compensation & Logistics

Location: Santa Clara, California, United States. #LI-Hybrid (hybrid work policy).
Base salary ranges: USD 144,000 - 230,000 for Level 3; USD 168,000 - 270,250 for Level 4. Final base salary will be determined by location, experience, and internal pay equity. Eligible for equity and benefits.
Applications accepted at least until October 24, 2025.

Company

NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. We do not discriminate on the basis of any characteristic protected by law.