Principal Network Engineer β€” DC and AI Clusters

at Nvidia
USD 248,000-391,000 per year
SENIOR
βœ… Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 7 Ansible @ 4 Chef @ 4 Go @ 8 Ruby @ 8 Terraform @ 4 Python @ 8 Statistics @ 4 Data Science @ 4 Mathematics @ 4 Networking @ 4 Puppet @ 4 Salt @ 4 GPU @ 4

Details

We are seeking a highly skilled Principal Network Engineer to build the next generation of data center AI clusters and lead a major technology transformation to run AI on-prem. This is a hands-on engineering role focused on architecture, design, development, and deployment of ultra-high-speed, resilient, and scalable DC AI clusters and interconnects for GPU-accelerated data centers and compute clusters. The role requires strong problem-solving, deep network theory knowledge, network security protocols & standards, routing, switching, and automation.

Responsibilities

  • Lead the architecture, design, and deployment of global-scale data center interconnects and fabric for HPC, AI, and GPU computing clusters.
  • Develop high-performance data center fabric using InfiniBand, Ultra Ethernet and related technologies.
  • Optimize carrier interconnects, intra- and inter-DC routing, and dark fiber deployments to ensure low latency and high reliability.
  • Partner with system, OS, GPU, and HPC teams to deliver scalable, highly available networks for extreme-performance workloads.
  • Implement network monitoring, telemetry, troubleshooting, and continuous performance improvement processes.
  • Drive technology selection, vendor engagement, and lifecycle management for data center hardware and software.
  • Collaborate with internal product managers to develop NVIDIA-on-NVIDIA solutions.

Requirements

  • MS or PhD in Electrical Engineering, Computer Science, Computer Engineering, Artificial Intelligence, Data Science, Mathematics, Statistics, or equivalent experience.
  • 12+ years of experience building, managing, and supporting large-scale hybrid networks and developing automation pipelines using Python, Ruby, Go, or other infrastructure automation languages.
  • Expertise in networking technologies: InfiniBand, Ultra Ethernet, ROCEv2, DCQCN, TCP/UDP, IPv4/IPv6, BGP/MP-BGP, VPN, L2 switching, EVPN, VxLAN, Segment Routing, MPLS.
  • Experience automating network infrastructure and using automated configuration/management tools (examples listed: Python, Terraform, Chef, Puppet, Ansible, Salt).
  • Strong understanding of network security protocols and standards, routing, switching, and fundamental network theory.

Benefits & Compensation

  • Base salary range: 248,000 USD - 391,000 USD (determined by location, experience, and internal pay parity).
  • Eligible for equity and company benefits (link to benefits referenced in original posting).
  • NVIDIA emphasizes diversity and is an equal opportunity employer.

Additional Details

  • Role type: Full time
  • Location note: #LI-Hybrid (hybrid work model indicated)
  • Applications accepted at least until October 5, 2025.