Senior DevOps and Build Systems Engineer

at Nvidia
USD 144,000-270,200 per year
SENIOR
βœ… Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Ansible @ 4 Docker @ 4 Kubernetes @ 4 DevOps @ 4 Terraform @ 4 Python @ 4 GCP @ 4 GitHub @ 4 GitHub Actions @ 4 CI/CD @ 4 AWS @ 4 Azure @ 4 Bash @ 4 Debugging @ 4 Design Patterns @ 4 OSS @ 4 LLM @ 4 Compliance @ 4 Agile @ 4

Details

We are seeking a Senior DevOps and Build Systems Engineer to join the NVIDIA AI TensorRT-LLM team. You will take ownership of systems that power engineering innovation, covering CI/CD pipelines, build systems, product security, observability, and compliance. The role involves designing and implementing solutions with autonomy and collaborating with cross-functional teams and external partners to drive efficiency and reliability.

Responsibilities

  • Build and maintain infrastructure from first principles needed to deliver TensorRT LLM.
  • Maintain and improve CI/CD pipelines to automate build, test, and deployment processes; identify and remove bottlenecks.
  • Manage tooling and enable automations for redundant manual workflows using GitHub Actions, GitLab, Terraform, etc.
  • Perform and enable security scans and handle security CVEs for infrastructure components.
  • Improve modularity of build systems using CMake.
  • Use AI to help build automated triaging workflows.
  • Collaborate extensively with cross-functional teams to integrate pipelines from deep learning frameworks and components to ensure seamless deployment and inference of deep learning models on the platform.

Requirements

  • Master’s degree or equivalent experience.
  • 3+ years of experience in Computer Science, computer architecture, or a related field.
  • Ability to work in a fast-paced, agile team environment.
  • Excellent Bash, CI/CD, and Python programming and software design skills, including debugging, performance analysis, and test design.
  • Experience with CMake.
  • Background with security best practices for releasing libraries and handling CVEs.
  • Experience administering, monitoring, and deploying systems and services on GitHub and cloud platforms; supporting other technical teams in monitoring operating efficiencies and responding to needs.
  • Highly skilled in Kubernetes and Docker/containerd.
  • Automation expertise with hands-on skills in frameworks like Ansible and Terraform.
  • Experience with one or more cloud providers (AWS, Azure, or GCP).

Ways to stand out

  • Experience contributing to large open-source deep learning communities (GitHub, bug tracking, branching/merging, OSS licensing, patches).
  • Experience defining and leading DevOps strategy (design patterns, reliability, scaling) for a team or organization.
  • Experience driving efficiencies in software architecture, creating metrics, and implementing infrastructure-as-code and automation improvements.
  • Deep understanding of test automation infrastructure, frameworks, and test analysis.
  • Experience troubleshooting and debugging across storage systems, kernels, and containers; familiarity with Triton Inference Server is a plus.

Compensation & Logistics

  • Location: Santa Clara, California, United States. #LI-Hybrid (hybrid work policy).
  • Base salary ranges: USD 144,000 - 230,000 for Level 3; USD 168,000 - 270,250 for Level 4. Final base salary will be determined by location, experience, and internal pay equity. Eligible for equity and benefits.
  • Applications accepted at least until October 24, 2025.

Company

NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. We do not discriminate on the basis of any characteristic protected by law.