Senior Engineer, Performance - Cloud Software

at Nvidia

📍 Santa Clara, United States

USD 144,000-270,200 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Security @ 4 Docker @ 6 Kubernetes @ 4 Distributed Systems @ 7 Data Science @ 4 Helm @ 6 Networking @ 4 GPU @ 4

Details

NVIDIA DGX Cloud provides a serverless generative AI infrastructure enabling NVIDIA's AI supercomputer technologies to be used by anyone. The DGX Cloud engineering team ensures customers receive timely and quality-assured releases. This role focuses on performance and scalability testing, identifying limitations across the Kubernetes and application stack using industry-standard tools and telemetry, and driving improvements across infrastructure and application layers.

Responsibilities

Analyze and optimize performance across application, middleware, runtime, and infrastructure layers — networking, storage, GPU utilization, and beyond.
Develop tooling and metrics that provide deep observability into system performance.
Collaborate closely with infra, platform, runtime, and product teams to identify key performance goals and drive systemic improvements.
Lead investigations into high-impact performance regressions or scalability issues in production.
Influence architecture and design decisions to prioritize latency, throughput, and efficiency at scale.
Drive performance testing strategies and help define SLAs/SLOs around latency and throughput for critical systems.

Requirements

Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field — or equivalent experience.
5+ years in software engineering with a strong track record in performance or scalability of high-scale distributed systems.
Deeply comfortable with performance profiling tools and tracing systems.
Ability to identify performance issues, perform root-cause analysis, and propose solutions.
Experience optimizing performance across one or more layers of the stack (e.g., database, networking, storage, application runtime, GC tuning, Golang internals, GPU utilization).
Contribution to observability, benchmarking, or performance-focused infrastructure at scale.
Strong understanding of OS internals, scheduling, memory management, and I/O patterns.
Demonstrated success navigating ambiguity and aligning stakeholders around performance goals.
Proficient in container-based infrastructure (Docker, Kubernetes, Helm).

Ways to stand out

Demonstrated ability to handle sophisticated technical environments while meeting security, reliability, scalability, and availability metrics.
Strong and confirmed knowledge of modern architectures at scale.

Compensation & Benefits

Base salary ranges (location- and level-dependent):
- Level 3: 144,000 USD - 230,000 USD
- Level 4: 168,000 USD - 270,250 USD
Eligible for equity and company benefits (see NVIDIA benefits link).

Other details

Location: Santa Clara, CA, United States.
Employment type: Full time.
Applications accepted at least until September 14, 2025.
NVIDIA is an equal opportunity employer committed to diversity and inclusion.