Senior Storage and Networking Product Engineer

at Nvidia
USD 168,000-264,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Ansible @ 4 Ceph @ 4 Chef @ 4 Go @ 4 Grafana @ 3 Kubernetes @ 4 Linux @ 4 Prometheus @ 3 Terraform @ 4 Python @ 4 Bash @ 4 Networking @ 4 Debugging @ 4 Puppet @ 4 Compliance @ 4 GPU @ 4

Details

At NVIDIA, we are pioneers in making the impossible achievable, particularly within AI, ML, and HPC. Joining this team as a Storage & Networking Product Engineer means contributing to the development and operation of highly available, high-performance infrastructure focused on storage, networking, low latency, and scalability.

Responsibilities

  • Architect, deploy, and maintain distributed storage clusters with a focus on scalable performance and data durability.
  • Develop and improve high-performance networking architectures for storage environments, ensuring low-latency data paths for AI/ML and HPC workloads.
  • Configure and tune RDMA, NVMe-over-Fabrics, RoCE, InfiniBand, and Ethernet-based fabrics for maximum performance.
  • Partner with GPU, networking, and systems teams to ensure seamless end-to-end performance across the full stack.
  • Develop automated systems for monitoring, recording, and notifying in storage and networking environments.
  • Build and maintain capacity planning models for network efficiency and storage growth.
  • Troubleshoot complex network-storage interactions, including bottlenecks in distributed filesystems, parallel storage, and interconnects.
  • Implement data protection and compliance controls such as encryption in-transit, access control, and auditing; foster automation in storage and networking operations using infrastructure-as-code and orchestration guided by AI/ML.

Requirements

  • BS/MS in Computer Science, Electrical Engineering, or a related field, or equivalent experience.
  • 12+ years of experience in storage systems engineering, production infrastructure, or large-scale data center operations.
  • Deep knowledge of networking protocols and technologies: TCP/IP, Ethernet, InfiniBand, RDMA, RoCE, NVMe-oF, Fibre Channel.
  • Hands-on experience with high-performance storage systems: Lustre, GPFS, Ceph, distributed object storage, enterprise SAN/NAS.
  • Expertise in Linux systems engineering, including tuning, performance analysis, and debugging.
  • Skilled in coding/scripting using Python, Bash, Go, or C/C++ to automate, monitor, and optimize performance.
  • Experience with configuration management and orchestration tools (Ansible, Terraform, Puppet, Chef, Kubernetes).
  • Familiarity with observability stacks (Prometheus, Grafana, Elastic, InfluxDB) to monitor and optimize storage and network performance.
  • Proven ability to recognize and resolve complex system bottlenecks within storage and networking layers.

Preferred / Ways to Stand Out

  • Experience crafting and operating RDMA-accelerated HPC/AI clusters at scale, including network topologies and large-scale switch/router deployments.
  • Familiarity with network telemetry and packet capture tools such as sFlow, NetFlow, and Wireshark, and a track record of capacity planning and optimizing distributed storage systems over high-speed networks.
  • Background in developing storage networks for AI/ML training pipelines, large-scale inference, and RAG workflows.
  • Proficiency in hybrid cloud storage and networking solutions (Kubernetes CSI, cloud-native fabrics, hybrid on-prem/cloud setups).
  • Contributions to open-source networking or storage projects.

Benefits

  • Base salary range: 168,000 USD - 264,500 USD (determined by location, experience, and comparable pay).
  • Eligible for equity and company benefits.
  • NVIDIA is committed to diversity and is an equal opportunity employer.

Additional Information

  • Application window: applications accepted at least until September 29, 2025.
  • Location: Santa Clara, CA, United States.