Senior Software Engineer, DevOps - Server Infrastructure

at Nvidia

📍 Santa Clara, United States

USD 184,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Security @ 4 Software Development @ 4 Ansible @ 4 Docker @ 4 ElasticSearch @ 4 Jenkins @ 4 Kafka @ 4 Kubernetes @ 4 Linux @ 7 DevOps @ 4 IaC @ 4 Kibana @ 4 Terraform @ 4 Python @ 4 GCP @ 4 CI/CD @ 4 Machine Learning @ 4 AWS @ 4 Azure @ 4 Helm @ 4 Debugging @ 4 GPU @ 4

Details

NVIDIA is searching for a DevOps and Infrastructure Software/Systems Engineer to bring up, develop and prototype a new class of server products and appliances for our Metropolis platforms. Metropolis applies AI to streaming video and data analytics for smart cities, public safety, traffic and parking management, and other city services. The role focuses on architecting the build and deployment process for GPU-based compute servers and automating modern code delivery and deployment pipelines across on-prem and cloud environments.

Responsibilities

Build, deploy and maintain GPU-based servers for Metropolis blueprints, platforms and machine learning applications across test, development and production environments.
Lead design and take responsibility for infrastructure components including network topologies, streaming servers and security.
Collaborate with software, IT, security and hardware teams across geographies to solve critical problems and performance issues.
Establish configuration environments by creating processes and tools for software development, debugging, testing, benchmarking and documentation.
Automate provisioning and management of bare-metal servers, internal cloud, Microsoft Azure and Amazon AWS.
Implement automated monitoring and operating procedures across on-premise and cloud environments.
Build and maintain infrastructure related to delivery of software artifacts (CI/CD pipelines, registries, package repos) produced by Metropolis application teams.
Create detailed documentation enabling customers, partners and system integrators to replicate the prototyped deployment architectures.

Requirements

BS or MS in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience).
8+ years of proven experience in configuration management and server administration (Linux) in an engineering hardware lab environment.
Good programming skills in Python and shell scripting.
Experience with configuration management and IaC: Ansible, Terraform.
Containerization and packaging: Docker, Docker Compose, Dockerfile, Container Registry.
Experience with Helm (templates and package repositories) and helm-based application deployment patterns.
Good understanding of configuring and managing Elasticsearch, Logstash, Kibana and Kafka ecosystems.
CI/CD and software delivery tools experience: Jenkins, pipeline scripting, Artifactory integration.
Experience with Kubernetes ecosystem and helm-based application deployment patterns.
Infrastructure provisioning automation experience with AWS, GCP and Azure.
Experience building configuration management, monitoring and automation tools.
Familiarity managing large-scale edge servers deployed in indoor and outdoor environments.
Practical experience with server virtualization technologies.
Strong interpersonal and cross-functional collaboration skills.

Benefits

Competitive base salary (range specified below), eligibility for equity and comprehensive benefits. See https://www.nvidiabenefits.com/ for details.
NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.
Application deadline: Applications for this job will be accepted at least until January 10, 2026.

Compensation

Base salary range: 184,000 USD - 287,500 USD. Final base salary is determined by location, experience, and pay of employees in similar positions.
You will also be eligible for equity and benefits.