Senior Storage and Networking Product Engineer

at Nvidia

📍 Santa Clara, United States

USD 168,000-264,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Required Skills & Competences ^?

Ansible @ 4 Ceph @ 4 Chef @ 4 Grafana @ 3 Kubernetes @ 4 Linux @ 4 Prometheus @ 3 Terraform @ 4 Python @ 4 Bash @ 4 Networking @ 4 Debugging @ 4 Puppet @ 4 Compliance @ 4 GPU @ 4

Details

At NVIDIA, we are pioneers in making the impossible achievable, particularly within AI, ML, and HPC. Joining this team as a Storage & Networking Product Engineer means contributing to the development and operation of highly available, high-performance infrastructure focused on storage, networking, low latency, and scalability.

Responsibilities

Architect, deploy, and maintain distributed storage clusters with a focus on scalable performance and data durability.
Develop and improve high-performance networking architectures for storage environments, ensuring low-latency data paths for AI/ML and HPC workloads.
Configure and tune RDMA, NVMe-over-Fabrics, RoCE, InfiniBand, and Ethernet-based fabrics for maximum performance.
Partner with GPU, networking, and systems teams to ensure seamless end-to-end performance across the full stack.
Develop automated systems for monitoring, recording, and notifying in storage and networking environments.
Build and maintain capacity planning models for network efficiency and storage growth.
Troubleshoot complex network-storage interactions, including bottlenecks in distributed filesystems, parallel storage, and interconnects.
Implement data protection and compliance controls such as encryption in-transit, access control, and auditing; foster automation in storage and networking operations using infrastructure-as-code and orchestration guided by AI/ML.

Requirements

BS/MS in Computer Science, Electrical Engineering, or a related field, or equivalent experience.
12+ years of experience in storage systems engineering, production infrastructure, or large-scale data center operations.
Deep knowledge of networking protocols and technologies: TCP/IP, Ethernet, InfiniBand, RDMA, RoCE, NVMe-oF, Fibre Channel.
Hands-on experience with high-performance storage systems: Lustre, GPFS, Ceph, distributed object storage, enterprise SAN/NAS.
Expertise in Linux systems engineering, including tuning, performance analysis, and debugging.
Skilled in coding/scripting using Python, Bash, Go, or C/C++ to automate, monitor, and optimize performance.
Experience with configuration management and orchestration tools (Ansible, Terraform, Puppet, Chef, Kubernetes).
Familiarity with observability stacks (Prometheus, Grafana, Elastic, InfluxDB) to monitor and optimize storage and network performance.
Proven ability to recognize and resolve complex system bottlenecks within storage and networking layers.

Preferred / Ways to Stand Out

Experience crafting and operating RDMA-accelerated HPC/AI clusters at scale, including network topologies and large-scale switch/router deployments.
Familiarity with network telemetry and packet capture tools such as sFlow, NetFlow, and Wireshark, and a track record of capacity planning and optimizing distributed storage systems over high-speed networks.
Background in developing storage networks for AI/ML training pipelines, large-scale inference, and RAG workflows.
Proficiency in hybrid cloud storage and networking solutions (Kubernetes CSI, cloud-native fabrics, hybrid on-prem/cloud setups).
Contributions to open-source networking or storage projects.

Benefits

Base salary range: 168,000 USD - 264,500 USD (determined by location, experience, and comparable pay).
Eligible for equity and company benefits.
NVIDIA is committed to diversity and is an equal opportunity employer.

Additional Information

Application window: applications accepted at least until September 29, 2025.
Location: Santa Clara, CA, United States.