Senior Systems Software Engineer, AV Infrastructure - Validation And Distributed Systems

at Nvidia
USD 184,000-287,500 per year
SENIOR
āœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Docker @ 4 Go @ 4 Grafana @ 7 Kubernetes @ 4 Linux @ 4 Prometheus @ 7 Terraform @ 4 Python @ 4 CI/CD @ 4 Distributed Systems @ 4 Hiring @ 4 AWS @ 7 Bash @ 4 Communication @ 7 Parquet @ 4 Protobuf @ 4 Debugging @ 7 Reporting @ 4 Compliance @ 4 GPU @ 7 Claude Code @ 4

Details

NVIDIA's Autonomous Vehicle (AV) Infrastructure organization is seeking a Senior Systems Software Engineer focused on building, deploying, and operating validation platforms at scale. The role centers on integrating distributed systems, managing large-scale data pipelines, and operationalizing validation workflows for autonomous driving. You will work with internal teams and external vendors to stand up vendor-provided platforms, validate integration paths, and ensure infrastructure is reliable, secure, and production-ready.

Responsibilities

  • Deploy and operationalize vendor-provided platforms in a cloud-based service platform, beginning with test environments to validate dependencies, workflows, and performance.
  • Build and maintain distributed infrastructure supporting large-scale log ingestion, data processing, and scenario validation at scale.
  • Automate workflows and pipelines using Go, Python, Bash, and Bazel to ensure reproducibility, efficiency, and reliable distributed execution.
  • Integrate simulation and drive logs (for example, world model data and road geometries) in formats such as protobuf and parquet with validation platforms to provide end-to-end coverage analysis.
  • Provide visualization and reporting capabilities to surface validation results, coverage metrics, and actionable insights for developers and stakeholders.
  • Define and manage access controls, monitoring, and security policies to ensure compliance while enabling collaboration across internal and vendor teams.
  • Partner closely with internal teams and external vendors to troubleshoot issues, refine SLAs, and continuously improve operational reliability and scalability.

Requirements

  • BS/MS in Computer Science, Computer Engineering, or a relevant field (or equivalent experience).
  • 5+ years of professional experience in infrastructure, distributed systems, or platform engineering.
  • Hands-on experience with Linux systems, Kubernetes, Docker, Terraform, and CI/CD pipelines.
  • Strong scripting/development skills in Python and Bash, with exposure to C++ and/or Go.
  • Familiarity with Bazel build/test automation frameworks.
  • Experience in data/log ingestion workflows and distributed compute/storage systems.
  • Strong debugging, problem-solving, and communication skills to collaborate across internal and vendor teams.
  • Proven comfort leveraging AI-based development tools, such as Claude Code and Cursor.

Ways To Stand Out (Preferred)

  • Strong experience in large-scale distributed systems or GPU/CPU cluster deployments, infrastructure automation, data pipelines, and AWS.
  • Prior experience with scenario-based validation platforms or AV simulation ecosystems.
  • Strong knowledge of logging/monitoring/alerting frameworks (Prometheus, Grafana, ELK stack).
  • Experience working directly with external vendors to integrate platforms and operationalize SLAs.
  • Proactive use of AI/ML techniques to accelerate log analysis, coverage metrics, or integration workflows.

Compensation & Additional Information

  • Base salary range provided by location and level: 184,000 USD - 287,500 USD (Level 4). Lower-range level (Level 3) is listed separately (148,000 USD - 235,750 USD).
  • You will also be eligible for equity and benefits.
  • Applications for this job will be accepted at least until September 21, 2025.

Company & Culture

NVIDIA emphasizes innovation in Deep Learning, AI, and Autonomous Vehicles. The company is an equal opportunity employer and values diversity in hiring and promotion practices.