Used Tools & Technologies
IaCRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 7
Docker @ 4
Grafana @ 4
Kubernetes @ 4
Linux @ 7
Prometheus @ 4
DevOps @ 4
Python @ 6
GitHub @ 4
GitHub Actions @ 4
CI/CD @ 4
AWS @ 4
Git @ 4
Networking @ 4
SRE @ 7
GPU @ 4
AI @ 4
Robotics @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are seeking a highly skilled and experienced Senior DevOps Engineer to join our dynamic NVIDIA Robotics DevOps team. The role focuses on ownership of CI/CD and lab infrastructure for robotics workloads, working on open-source and proprietary applications and packages. You will collaborate with other teams to ensure reliability, scalability, and efficiency of hardware and CI services.
Responsibilities
- Manage CI runners/executors and capacity across on-prem and cloud environments.
- Use infrastructure as code to provision and update CI environments and supporting services.
- Deploy and extend monitoring, logging, and alerting for CI, GPU servers, Tegra boards, and lab services (examples: Prometheus, Grafana, ELK-style stacks).
- Operate Tegra/Jetson testbeds used by CI and developers: provisioning, flashing, OS/JetPack updates, recovery, and reservation/scheduling.
- Diagnose and resolve issues spanning power, networking, OS, containers, CI agents, and test infrastructure.
Requirements
- Bachelor’s or Master’s in Computer Science, Computer Engineering, Electrical Engineering, or related field — or equivalent experience.
- 8+ years in DevOps/SRE/infrastructure roles, including ownership of CI or lab environments; 3+ years in a senior capacity.
- Practical experience with AWS or similar cloud platforms for CI or compute workloads.
- Hands-on experience with CI/CD systems (e.g., GitLab CI, GitHub Actions) and Git-based workflows.
- Strong Linux systems expertise (networking, storage, performance, security).
- Proficiency in Python and shell for automation and tooling.
- Hands-on work with servers, embedded boards, networking gear, and remote management.
Ways to stand out
- Experience with NVIDIA Tegra infrastructure.
- Solid knowledge of containers and orchestration (Docker, Kubernetes).
- Proven track record of driving infrastructure reliability improvements and cross-team projects.
Compensation & Benefits
- Base salary range: 184,000 USD - 287,500 USD (determined based on location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits (link to NVIDIA benefits provided in original posting).
Additional information
- Applications for this job will be accepted at least until July 4, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and is committed to fostering an inclusive work environment.