Principal Network Engineer — DC and AI Clusters

at Nvidia

📍 Santa Clara, United States

USD 248,000-391,000 per year

SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Security @ 7 Ansible @ 4 Chef @ 4 Go @ 8 Ruby @ 8 Terraform @ 4 Python @ 8 Statistics @ 4 Data Science @ 4 Mathematics @ 4 Networking @ 4 Puppet @ 4 Salt @ 4 GPU @ 4

Details

We are seeking a highly skilled Principal Network Engineer to join our dynamic team to build the next generation of IT AI Clusters and help lead the team through a major technology transformation into running AI on-prem and build infrastructure by integrating enterprise-ready platforms while building a solid foundation with automation. You will solve networking problems for scalable AI clusters and be a hands-on engineer focused on the architecture, design, development, and deployment of ultra-high-speed, resilient, and scalable DC AI clusters and interconnects for GPU-accelerated data centers and compute clusters.

Responsibilities

Lead the architecture, design, and deployment of global-scale data center interconnects and fabric for HPC, AI, and GPU computing clusters.
Develop high-performance data center fabric using InfiniBand, Ultra Ethernet and related technologies.
Optimize carrier interconnects, intra- and inter-DC routing, and dark fiber deployments to ensure low latency and high reliability.
Partner with system, OS, GPU, and HPC teams to deliver scalable, highly available networks for extreme-performance workloads.
Implement network monitoring, telemetry, troubleshooting, and continuous performance improvement processes.
Drive technology selection, vendor engagement, and lifecycle management for data center hardware and software.
Collaborate with internal product managers to develop NVIDIA-on-NVIDIA solutions.

Requirements

MS or PhD in Electrical Engineering, Computer Science, Computer Engineering, Artificial Intelligence, Data Science, Mathematics, Statistics, or equivalent experience.
12+ years of experience building, managing, and supporting large-scale hybrid networks; experience developing automation pipelines with Python, Ruby, Go, or other infrastructure automation languages.
Expertise in networking technologies: InfiniBand, Ultra Ethernet, ROCEv2, DCQCN, TCP/UDP, IPv4/IPv6, BGP/MP-BGP, VPN, L2 switching, EVPN, VxLAN, Segment Routing, MPLS.
Experience automating network infrastructure and using automated configuration/management systems (Python, Terraform, Chef, Puppet, Ansible, Salt, etc.).
Strong understanding of network security protocols and standards, routing, switching, automation, and fundamental network theory.

Benefits & Compensation

Base salary range: 248,000 USD - 391,000 USD (determined by location, experience, and comparable pay).
Eligibility for equity and additional employee benefits (see NVIDIA benefits page).
NVIDIA is an equal opportunity employer committed to diversity and inclusion.

Other Information

Location: Santa Clara, CA, United States (hybrid role; #LI-Hybrid).
Applications accepted until at least October 5, 2025.