Senior Systems Software Engineer, AV Infrastructure - Validation And Distributed Systems

at Nvidia

📍 Santa Clara, United States

USD 184,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Security @ 4 Docker @ 4 Go @ 4 Grafana @ 7 Kubernetes @ 4 Linux @ 4 Prometheus @ 7 Terraform @ 4 Python @ 4 CI/CD @ 4 Distributed Systems @ 4 Hiring @ 4 AWS @ 7 Bash @ 4 Communication @ 7 Parquet @ 4 Protobuf @ 4 Debugging @ 7 Reporting @ 4 Compliance @ 4 GPU @ 7 Claude Code @ 4

Details

NVIDIA's Autonomous Vehicle (AV) Infrastructure organization is seeking a Senior Systems Software Engineer focused on building, deploying, and operating validation platforms at scale. The role centers on integrating distributed systems, managing large-scale data pipelines, and operationalizing validation workflows for autonomous driving. You will work with internal teams and external vendors to stand up vendor-provided platforms, validate integration paths, and ensure infrastructure is reliable, secure, and production-ready.

Responsibilities

Deploy and operationalize vendor-provided platforms in a cloud-based service platform, beginning with test environments to validate dependencies, workflows, and performance.
Build and maintain distributed infrastructure supporting large-scale log ingestion, data processing, and scenario validation at scale.
Automate workflows and pipelines using Go, Python, Bash, and Bazel to ensure reproducibility, efficiency, and reliable distributed execution.
Integrate simulation and drive logs (for example, world model data and road geometries) in formats such as protobuf and parquet with validation platforms to provide end-to-end coverage analysis.
Provide visualization and reporting capabilities to surface validation results, coverage metrics, and actionable insights for developers and stakeholders.
Define and manage access controls, monitoring, and security policies to ensure compliance while enabling collaboration across internal and vendor teams.
Partner closely with internal teams and external vendors to troubleshoot issues, refine SLAs, and continuously improve operational reliability and scalability.

Requirements

BS/MS in Computer Science, Computer Engineering, or a relevant field (or equivalent experience).
5+ years of professional experience in infrastructure, distributed systems, or platform engineering.
Hands-on experience with Linux systems, Kubernetes, Docker, Terraform, and CI/CD pipelines.
Strong scripting/development skills in Python and Bash, with exposure to C++ and/or Go.
Familiarity with Bazel build/test automation frameworks.
Experience in data/log ingestion workflows and distributed compute/storage systems.
Strong debugging, problem-solving, and communication skills to collaborate across internal and vendor teams.
Proven comfort leveraging AI-based development tools, such as Claude Code and Cursor.

Ways To Stand Out (Preferred)

Strong experience in large-scale distributed systems or GPU/CPU cluster deployments, infrastructure automation, data pipelines, and AWS.
Prior experience with scenario-based validation platforms or AV simulation ecosystems.
Strong knowledge of logging/monitoring/alerting frameworks (Prometheus, Grafana, ELK stack).
Experience working directly with external vendors to integrate platforms and operationalize SLAs.
Proactive use of AI/ML techniques to accelerate log analysis, coverage metrics, or integration workflows.

Compensation & Additional Information

Base salary range provided by location and level: 184,000 USD - 287,500 USD (Level 4). Lower-range level (Level 3) is listed separately (148,000 USD - 235,750 USD).
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until September 21, 2025.

Company & Culture

NVIDIA emphasizes innovation in Deep Learning, AI, and Autonomous Vehicles. The company is an equal opportunity employer and values diversity in hiring and promotion practices.