Used Tools & Technologies
Not specified
Required Skills & Competences ?
Grafana @ 4 Kubernetes @ 3 Prometheus @ 4 Python @ 6 TensorFlow @ 4 Bash @ 6 Communication @ 4 PyTorch @ 4 GPU @ 4Details
NVIDIA is seeking a Senior Storage Performance Engineer to join the team in Santa Clara, CA. This role focuses on creating, implementing, and analyzing complex benchmarks to optimize performance across NVIDIA's infrastructure stack. The work directly impacts AI inference and training, NVIDIA NIMs, RAG pipelines, HPC codes, and storage platforms.
Responsibilities
- Craft and deliver performance benchmarks across AI, HPC, and enterprise storage platforms.
- Test and benchmark storage appliances (block, file, object) against NVIDIA data center solutions.
- Operate and adjust AI inference and training workloads using frameworks such as PyTorch, TensorFlow, and NVIDIA NIMs.
- Benchmark and analyze retrieval-augmented generation (RAG) pipelines, including ingestion, retrieval, and inference performance with vector databases.
- Profile and optimize MPI-based and multi-node distributed applications.
- Collaborate with product managers, system architects, and partners to fine-tune hardware/software stack performance.
Requirements
- 12+ years of experience in performance engineering, benchmarking, or HPC/AI systems.
- Deep expertise in AI/ML and deep learning frameworks: PyTorch, TensorFlow, Triton.
- Strong background in storage systems and filesystems.
- Proven experience with MPI, OpenMP, and Slurm in large-scale compute environments.
- Proficiency in Python, Bash, and automation frameworks for job orchestration and results parsing.
- Excellent communication skills; ability to switch between deep technical work and high-level business impact.
- BS, MS, or PhD or equivalent experience in Computer Science, Electrical Engineering, or related field.
Preferred / Ways to stand out
- Experience with RAG pipelines and vector databases (FAISS, Milvus, Qdrant).
- Familiarity with Kubernetes and CSI-based persistent storage systems.
- Knowledge of GPU profiling tools (Nsight Systems, PyTorch Profiler).
- Experience with telemetry/monitoring frameworks (Prometheus, Grafana).
- Enthusiasm for exploring the boundaries of AI, HPC, and storage capabilities.
Compensation & Benefits
- Base salary range: 200,000 USD - 322,000 USD (determined by location, experience, and comparable employees).
- Eligible for equity and company benefits (see NVIDIA benefits page).
Additional Information
- Location: Santa Clara, CA, United States.
- Applications accepted at least until September 29, 2025.
- NVIDIA is an equal opportunity employer committed to a diverse work environment.