Senior Systems Software Engineer, AV Infrastructure - Validation And Distributed Systems
at Nvidia
š Santa Clara, United States
USD 184,000-287,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Docker @ 4 Go @ 4 Grafana @ 7 Kubernetes @ 4 Linux @ 4 Prometheus @ 7 Terraform @ 4 Python @ 4 CI/CD @ 4 Distributed Systems @ 4 Hiring @ 4 AWS @ 7 Bash @ 4 Communication @ 7 Parquet @ 4 Protobuf @ 4 Debugging @ 7 Reporting @ 4 Compliance @ 4 GPU @ 7 Claude Code @ 4Details
NVIDIA's Autonomous Vehicle (AV) Infrastructure organization is seeking a Senior Systems Software Engineer focused on building, deploying, and operating validation platforms at scale. The role centers on integrating distributed systems, managing large-scale data pipelines, and operationalizing validation workflows for autonomous driving. You will work with internal teams and external vendors to stand up vendor-provided platforms, validate integration paths, and ensure infrastructure is reliable, secure, and production-ready.
Responsibilities
- Deploy and operationalize vendor-provided platforms in a cloud-based service platform, beginning with test environments to validate dependencies, workflows, and performance.
- Build and maintain distributed infrastructure supporting large-scale log ingestion, data processing, and scenario validation at scale.
- Automate workflows and pipelines using Go, Python, Bash, and Bazel to ensure reproducibility, efficiency, and reliable distributed execution.
- Integrate simulation and drive logs (for example, world model data and road geometries) in formats such as protobuf and parquet with validation platforms to provide end-to-end coverage analysis.
- Provide visualization and reporting capabilities to surface validation results, coverage metrics, and actionable insights for developers and stakeholders.
- Define and manage access controls, monitoring, and security policies to ensure compliance while enabling collaboration across internal and vendor teams.
- Partner closely with internal teams and external vendors to troubleshoot issues, refine SLAs, and continuously improve operational reliability and scalability.
Requirements
- BS/MS in Computer Science, Computer Engineering, or a relevant field (or equivalent experience).
- 5+ years of professional experience in infrastructure, distributed systems, or platform engineering.
- Hands-on experience with Linux systems, Kubernetes, Docker, Terraform, and CI/CD pipelines.
- Strong scripting/development skills in Python and Bash, with exposure to C++ and/or Go.
- Familiarity with Bazel build/test automation frameworks.
- Experience in data/log ingestion workflows and distributed compute/storage systems.
- Strong debugging, problem-solving, and communication skills to collaborate across internal and vendor teams.
- Proven comfort leveraging AI-based development tools, such as Claude Code and Cursor.
Ways To Stand Out (Preferred)
- Strong experience in large-scale distributed systems or GPU/CPU cluster deployments, infrastructure automation, data pipelines, and AWS.
- Prior experience with scenario-based validation platforms or AV simulation ecosystems.
- Strong knowledge of logging/monitoring/alerting frameworks (Prometheus, Grafana, ELK stack).
- Experience working directly with external vendors to integrate platforms and operationalize SLAs.
- Proactive use of AI/ML techniques to accelerate log analysis, coverage metrics, or integration workflows.
Compensation & Additional Information
- Base salary range provided by location and level: 184,000 USD - 287,500 USD (Level 4). Lower-range level (Level 3) is listed separately (148,000 USD - 235,750 USD).
- You will also be eligible for equity and benefits.
- Applications for this job will be accepted at least until September 21, 2025.
Company & Culture
NVIDIA emphasizes innovation in Deep Learning, AI, and Autonomous Vehicles. The company is an equal opportunity employer and values diversity in hiring and promotion practices.