Senior Engineer, Performance - Cloud Software

at Nvidia
USD 144,000-270,200 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Docker @ 6 Kubernetes @ 6 Distributed Systems @ 7 Data Science @ 4 Helm @ 6 Networking @ 4 GPU @ 4

Details

NVIDIA leads developments in Artificial Intelligence, High-Performance Computing (HPC) and Visualization. DGX Cloud provides a serverless generative AI infrastructure enabling NVIDIA's AI supercomputer technologies to be used by anyone. The DGX Cloud engineering team ensures customers receive timely and quality-assured releases. This role is for a Performance Engineer proficient in performance and scalability testing, identifying limitations across the Kubernetes (K8s) and application stack using industry-standard tools and telemetry. The role involves problem-solving in a distributed team setting and driving performance and scalability improvements across the stack.

Responsibilities

  • Analyze and optimize performance across application, middleware, runtime, and infrastructure layers — including networking, storage, GPU utilization, and more.
  • Develop tooling and metrics that provide deep observability into system performance.
  • Collaborate with infrastructure, platform, runtime, and product teams to identify key performance goals and drive systemic improvements.
  • Lead investigations into high-impact performance regressions or scalability issues in production.
  • Influence architecture and design decisions to prioritize latency, throughput, and efficiency at scale.
  • Drive performance testing strategies and help define SLAs/SLOs around latency and throughput for critical systems.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field (or equivalent experience).
  • 5+ years in software engineering with a strong track record in performance or scalability of high-scale distributed systems.
  • Deep comfort with performance profiling tools and tracing systems.
  • Ability to identify performance issues, perform root cause analysis, and propose solutions.
  • Experience optimizing performance across one or more layers of the stack (e.g., database, networking, storage, application runtime, GC tuning, Golang internals, GPU utilization).
  • Contributions to observability, benchmarking, or performance-focused infrastructure at scale.
  • Strong understanding of OS internals, scheduling, memory management, and IO patterns.
  • Demonstrated ability to navigate ambiguity and align stakeholders around performance goals.
  • Proficiency in container-based infrastructure (Docker, Kubernetes, Helm).

Ways to Stand Out

  • Demonstrated ability to handle sophisticated technical environments while meeting or exceeding security, reliability, scalability, and availability metrics.
  • Strong and confirmed knowledge of modern architectures at scale.

Benefits

  • Competitive base salary (ranges provided below by level), eligibility for equity, and access to NVIDIA benefits.

Compensation Details

  • Base salary range for Level 3: 144,000 USD - 230,000 USD.
  • Base salary range for Level 4: 168,000 USD - 270,250 USD.

Additional Information

  • Applications accepted at least until September 14, 2025.
  • NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.