Senior Engineer, Performance - Cloud Software

at Nvidia
USD 144,000-270,200 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Docker @ 6 Kubernetes @ 4 Distributed Systems @ 7 Data Science @ 4 Helm @ 6 Networking @ 4 GPU @ 4

Details

NVIDIA DGX Cloud provides a serverless generative AI infrastructure enabling NVIDIA's AI supercomputer technologies to be used by anyone. The DGX Cloud engineering team ensures customers receive timely and quality-assured releases. This role focuses on performance and scalability testing, identifying limitations across the Kubernetes and application stack using industry-standard tools and telemetry, and driving improvements across infrastructure and application layers.

Responsibilities

  • Analyze and optimize performance across application, middleware, runtime, and infrastructure layers — networking, storage, GPU utilization, and beyond.
  • Develop tooling and metrics that provide deep observability into system performance.
  • Collaborate closely with infra, platform, runtime, and product teams to identify key performance goals and drive systemic improvements.
  • Lead investigations into high-impact performance regressions or scalability issues in production.
  • Influence architecture and design decisions to prioritize latency, throughput, and efficiency at scale.
  • Drive performance testing strategies and help define SLAs/SLOs around latency and throughput for critical systems.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field — or equivalent experience.
  • 5+ years in software engineering with a strong track record in performance or scalability of high-scale distributed systems.
  • Deeply comfortable with performance profiling tools and tracing systems.
  • Ability to identify performance issues, perform root-cause analysis, and propose solutions.
  • Experience optimizing performance across one or more layers of the stack (e.g., database, networking, storage, application runtime, GC tuning, Golang internals, GPU utilization).
  • Contribution to observability, benchmarking, or performance-focused infrastructure at scale.
  • Strong understanding of OS internals, scheduling, memory management, and I/O patterns.
  • Demonstrated success navigating ambiguity and aligning stakeholders around performance goals.
  • Proficient in container-based infrastructure (Docker, Kubernetes, Helm).

Ways to stand out

  • Demonstrated ability to handle sophisticated technical environments while meeting security, reliability, scalability, and availability metrics.
  • Strong and confirmed knowledge of modern architectures at scale.

Compensation & Benefits

  • Base salary ranges (location- and level-dependent):
    • Level 3: 144,000 USD - 230,000 USD
    • Level 4: 168,000 USD - 270,250 USD
  • Eligible for equity and company benefits (see NVIDIA benefits link).

Other details

  • Location: Santa Clara, CA, United States.
  • Employment type: Full time.
  • Applications accepted at least until September 14, 2025.
  • NVIDIA is an equal opportunity employer committed to diversity and inclusion.