Senior DevOps Engineer - Infrastructure

at Nvidia

📍 Santa Clara, United States

USD 184,000-287,500 per year

SENIOR

✅ On-site

Used Tools & Technologies

IaC

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Security @ 7 Docker @ 4 Grafana @ 4 Kubernetes @ 4 Linux @ 7 Prometheus @ 4 DevOps @ 4 Python @ 6 GitHub @ 4 GitHub Actions @ 4 CI/CD @ 4 AWS @ 4 Git @ 4 Networking @ 4 SRE @ 7 GPU @ 4 AI @ 4 Robotics @ 4

Details

We are seeking a highly skilled and experienced Senior DevOps Engineer to join our dynamic NVIDIA Robotics DevOps team. The role focuses on ownership of CI/CD and lab infrastructure for robotics workloads, working on open-source and proprietary applications and packages. You will collaborate with other teams to ensure reliability, scalability, and efficiency of hardware and CI services.

Responsibilities

Manage CI runners/executors and capacity across on-prem and cloud environments.
Use infrastructure as code to provision and update CI environments and supporting services.
Deploy and extend monitoring, logging, and alerting for CI, GPU servers, Tegra boards, and lab services (examples: Prometheus, Grafana, ELK-style stacks).
Operate Tegra/Jetson testbeds used by CI and developers: provisioning, flashing, OS/JetPack updates, recovery, and reservation/scheduling.
Diagnose and resolve issues spanning power, networking, OS, containers, CI agents, and test infrastructure.

Requirements

Bachelor’s or Master’s in Computer Science, Computer Engineering, Electrical Engineering, or related field — or equivalent experience.
8+ years in DevOps/SRE/infrastructure roles, including ownership of CI or lab environments; 3+ years in a senior capacity.
Practical experience with AWS or similar cloud platforms for CI or compute workloads.
Hands-on experience with CI/CD systems (e.g., GitLab CI, GitHub Actions) and Git-based workflows.
Strong Linux systems expertise (networking, storage, performance, security).
Proficiency in Python and shell for automation and tooling.
Hands-on work with servers, embedded boards, networking gear, and remote management.

Ways to stand out

Experience with NVIDIA Tegra infrastructure.
Solid knowledge of containers and orchestration (Docker, Kubernetes).
Proven track record of driving infrastructure reliability improvements and cross-team projects.

Compensation & Benefits

Base salary range: 184,000 USD - 287,500 USD (determined based on location, experience, and pay of employees in similar positions).
Eligible for equity and benefits (link to NVIDIA benefits provided in original posting).

Additional information

Applications for this job will be accepted at least until July 4, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer and is committed to fostering an inclusive work environment.