Senior System Software Engineer - Data Platform Observability

at Nvidia
USD 184,000-287,500 per year
SENIOR
✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences

Software Development @ 7 Ansible @ 4 Go @ 7 Grafana @ 4 Kubernetes @ 4 Prometheus @ 4 Terraform @ 4 Python @ 7 Spark @ 4 Java @ 7 Helm @ 4 JavaScript @ 7 React @ 7 Rust @ 7 Microservices @ 4 API @ 4 Audit @ 4 Compliance @ 4 OpenTelemetry @ 4 Observability @ 4 AI @ 4

Details

NVIDIA’s Hardware Infrastructure organization is seeking a Senior System Software Engineer to lead the evolution of the next-generation Data & Observability Platform. The team serves and collaborates with NVIDIA’s AI, hardware, and software engineering and research teams. The role is a full-stack technical lead responsible for the Observability stack and building a centralized platform used by thousands of NVIDIA engineers to visualize chip telemetry, debug distributed pipelines, and ensure platform reliability.

Responsibilities

  • Architect high-performance ingestion: design and build centralized telemetry pipelines capable of handling massive scale and solve global latency challenges by implementing modern, push-based edge collection architectures to replace legacy proxy models.
  • Build policy enforcement systems: design and implement infrastructure for data governance, policy engines, access control enforcement points, secure credential management, and audit logging (building governance controls into a platform, not just administering them).
  • Focus on user experience: develop a modern web interface and APIs that unify distinct observability signals into a consolidated user experience.
  • Optimize storage and cost: implement cost-effective tiered storage architectures and define strategies for routing high-volume data to cold storage solutions to reduce costs while maintaining multi-year data retention.
  • Drive platform automation: architect workflow orchestration systems to automate platform maintenance, data lifecycle management, and complex pipeline operations.
  • Provide operational and strategic data to empower engineers and researchers to continuously improve quality, workloads, and processes through better observability.

Requirements

  • BS or MS in Computer Science, Electrical Engineering, or related field (or equivalent experience).
  • 8+ years of full-stack software development experience with a focus on Data Platforms or Infrastructure Tools.
  • Strong full-stack fluency: proficiency in high-performance backend systems programming and modern frontend web frameworks for building responsive user interfaces (examples listed: Python, JavaScript, Java, Rust, Go, React, or similar).
  • Observability expertise: experience with platforms such as Apache Spark, Elastic/OpenSearch, Grafana, Prometheus, and other similar open-source tools. Hands-on experience operating and extending the Grafana ecosystem or ELK stack at scale. Understanding of internals of time-series databases and inverted indexes.
  • Infrastructure-as-code experience: deploying complex stateful services on Kubernetes using Helm, Terraform, or Ansible.
  • Familiarity with streaming and storage technologies and modern data lake formats.

Ways to Stand Out

  • Experience writing custom Grafana data source plugins or backend plugins in Go.
  • Background migrating legacy monoliths to microservices or Vector-based pipelines.
  • Experience with OpenTelemetry (OTEL) collector configuration, writing custom processors, or instrumentation SDKs.
  • Background in data governance, including implementation of Policy-as-Code or compliance frameworks in regulated environments.

Compensation & Benefits

  • Base salary range: 184,000 USD - 287,500 USD (determined based on location, experience, and pay of employees in similar positions).
  • Eligible for equity and company benefits (link to benefits referenced in original posting).

Other Information

  • Location listed: Santa Clara, CA, United States.
  • Employment type: Full time.
  • Applications accepted at least until March 1, 2026.
  • This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.