Senior System Software Engineer - Infrastructure

at Nvidia

📍 Santa Clara, United States

USD 224,000-356,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Security @ 4 Go @ 7 Kubernetes @ 4 Prometheus @ 4 DevOps @ 4 Terraform @ 4 Python @ 7 GCP @ 4 CI/CD @ 4 Datadog @ 4 ArgoCD @ 4 AWS @ 4 Azure @ 4 Bash @ 7 Communication @ 4 Helm @ 4 Networking @ 3 Splunk @ 4 Compliance @ 4 Cloud Computing @ 4 GPU @ 4

Details

Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work.

Join NVIDIA’s Platform team to help reshape the future of GPU cloud computing and contribute to projects in Deep Learning and AI. You will design and implement scalable infrastructure solutions and collaborate with cross-functional teams to improve developer productivity and platform reliability.

Responsibilities

Design, deploy, and maintain scalable AWS infrastructure using services such as EKS, EC2, S3, VPC, IAM, Lambda, and CloudWatch.
Manage and optimize Kubernetes clusters for high availability, resilience, and performance; work with Kubernetes internals and Helm charts.
Create and maintain GitLab CI/CD pipelines to automate build, test, and deployment workflows.
Develop automation scripts and Infrastructure as Code templates using Terraform.
Monitor system performance and implement logging, metrics, and alerting using tools like Prometheus, Datadog, Splunk, or LGTM.
Implement DevSecOps best practices, embedding security scans, compliance checks, and secret management into CI/CD lifecycles.
Support platform observability, diagnose production incidents, and enhance self-service capabilities for developer teams.
Collaborate with cross-functional teams to streamline delivery and improve developer productivity.

Requirements

BS/MS in Computer Science or equivalent experience.
12+ years of hands-on experience building/supporting complex services.
Strong hands-on experience with AWS services (VPC, IAM, EC2, EKS, Lambda, CloudWatch).
Deep knowledge of Kubernetes internals, Helm charts, and container orchestration principles.
Proficiency with GitLab CI/CD or equivalent pipeline automation tools.
Experience implementing GitOps workflows (ArgoCD, FluxCD).
Strong foundation in scripting languages such as Python, Bash, or Go.
Familiarity with networking, load balancing, and security in cloud-native environments.
Experience enforcing cloud and container security standards and compliance practices.
Excellent documentation, problem-solving, and communication skills for cross-team alignment.

Preferred / Ways to stand out

Managed multi-cloud and hybrid Kubernetes clusters across AWS, GCP, and Azure.
Contributions to open-source DevOps projects (including Kubernetes and GitLab initiatives).
Certifications such as CKA, AWS DevOps Engineer, or GitLab Certified Specialist.
Applied AI/ML tools and AIOps platforms for predictive monitoring and automation.
Led DevOps/platform engineering teams in chaos testing, disaster recovery, and process optimization.

Compensation & Benefits

Base salary range: 224,000 USD - 356,500 USD (base salary determined by location, experience, and pay of employees in similar positions).
Eligible for equity and company benefits (see NVIDIA benefits).

Applications for this job will be accepted at least until December 13, 2025.

NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. We value diversity and do not discriminate on the basis of protected characteristics.