Senior Engineer, Performance - Cloud Software

at Nvidia

📍 Santa Clara, United States

USD 144,000-270,200 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Security @ 4 Docker @ 6 Kubernetes @ 6 Distributed Systems @ 7 Data Science @ 4 Helm @ 6 Networking @ 4 GPU @ 4

Details

NVIDIA leads developments in Artificial Intelligence, High-Performance Computing (HPC) and Visualization. DGX Cloud provides a serverless generative AI infrastructure enabling NVIDIA's AI supercomputer technologies to be used by anyone. The DGX Cloud engineering team ensures customers receive timely and quality-assured releases. This role is for a Performance Engineer proficient in performance and scalability testing, identifying limitations across the Kubernetes (K8s) and application stack using industry-standard tools and telemetry. The role involves problem-solving in a distributed team setting and driving performance and scalability improvements across the stack.

Responsibilities

Analyze and optimize performance across application, middleware, runtime, and infrastructure layers — including networking, storage, GPU utilization, and more.
Develop tooling and metrics that provide deep observability into system performance.
Collaborate with infrastructure, platform, runtime, and product teams to identify key performance goals and drive systemic improvements.
Lead investigations into high-impact performance regressions or scalability issues in production.
Influence architecture and design decisions to prioritize latency, throughput, and efficiency at scale.
Drive performance testing strategies and help define SLAs/SLOs around latency and throughput for critical systems.

Requirements

Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field (or equivalent experience).
5+ years in software engineering with a strong track record in performance or scalability of high-scale distributed systems.
Deep comfort with performance profiling tools and tracing systems.
Ability to identify performance issues, perform root cause analysis, and propose solutions.
Experience optimizing performance across one or more layers of the stack (e.g., database, networking, storage, application runtime, GC tuning, Golang internals, GPU utilization).
Contributions to observability, benchmarking, or performance-focused infrastructure at scale.
Strong understanding of OS internals, scheduling, memory management, and IO patterns.
Demonstrated ability to navigate ambiguity and align stakeholders around performance goals.
Proficiency in container-based infrastructure (Docker, Kubernetes, Helm).

Ways to Stand Out

Demonstrated ability to handle sophisticated technical environments while meeting or exceeding security, reliability, scalability, and availability metrics.
Strong and confirmed knowledge of modern architectures at scale.

Benefits

Competitive base salary (ranges provided below by level), eligibility for equity, and access to NVIDIA benefits.

Compensation Details

Base salary range for Level 3: 144,000 USD - 230,000 USD.
Base salary range for Level 4: 168,000 USD - 270,250 USD.

Additional Information

Applications accepted at least until September 14, 2025.
NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.