Principal Network Engineer β DC and AI Clusters
    at Nvidia
  
  
    
      π Santa Clara, United States
    
  
  
    
      
      
        USD 248,000-391,000 per year
      
    
    
  
  
    
  
  
  SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 7 Ansible @ 4 Chef @ 4 Go @ 8 Ruby @ 8 Terraform @ 4 Python @ 8 Statistics @ 4 Data Science @ 4 Mathematics @ 4 Networking @ 4 Puppet @ 4 Salt @ 4 GPU @ 4Details
We are seeking a highly skilled Principal Network Engineer to join our dynamic team to build the next generation of IT AI Clusters and help lead the team through a major technology transformation into running AI on-prem and build infrastructure by integrating enterprise-ready platforms while building a solid foundation with automation. You will solve networking problems for scalable AI clusters and be a hands-on engineer focused on the architecture, design, development, and deployment of ultra-high-speed, resilient, and scalable DC AI clusters and interconnects for GPU-accelerated data centers and compute clusters.
Responsibilities
- Lead the architecture, design, and deployment of global-scale data center interconnects and fabric for HPC, AI, and GPU computing clusters.
 - Develop high-performance data center fabric using InfiniBand, Ultra Ethernet and related technologies.
 - Optimize carrier interconnects, intra- and inter-DC routing, and dark fiber deployments to ensure low latency and high reliability.
 - Partner with system, OS, GPU, and HPC teams to deliver scalable, highly available networks for extreme-performance workloads.
 - Implement network monitoring, telemetry, troubleshooting, and continuous performance improvement processes.
 - Drive technology selection, vendor engagement, and lifecycle management for data center hardware and software.
 - Collaborate with internal product managers to develop NVIDIA-on-NVIDIA solutions.
 
Requirements
- MS or PhD in Electrical Engineering, Computer Science, Computer Engineering, Artificial Intelligence, Data Science, Mathematics, Statistics, or equivalent experience.
 - 12+ years of experience building, managing, and supporting large-scale hybrid networks; experience developing automation pipelines with Python, Ruby, Go, or other infrastructure automation languages.
 - Expertise in networking technologies: InfiniBand, Ultra Ethernet, ROCEv2, DCQCN, TCP/UDP, IPv4/IPv6, BGP/MP-BGP, VPN, L2 switching, EVPN, VxLAN, Segment Routing, MPLS.
 - Experience automating network infrastructure and using automated configuration/management systems (Python, Terraform, Chef, Puppet, Ansible, Salt, etc.).
 - Strong understanding of network security protocols and standards, routing, switching, automation, and fundamental network theory.
 
Benefits & Compensation
- Base salary range: 248,000 USD - 391,000 USD (determined by location, experience, and comparable pay).
 - Eligibility for equity and additional employee benefits (see NVIDIA benefits page).
 - NVIDIA is an equal opportunity employer committed to diversity and inclusion.
 
Other Information
- Location: Santa Clara, CA, United States (hybrid role; #LI-Hybrid).
 - Applications accepted until at least October 5, 2025.