Director, Global Network Reliability Engineering

at Nvidia

📍 Santa Clara, United States

USD 268,000-408,200 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Leadership @ 4 Communication @ 7 Networking @ 4 SRE @ 4

Details

NVIDIA is seeking a Director of Network Reliability Engineering within the Enterprise Networking organization in IT. As the #1 AI company in the world builds and delivers infrastructure, join the Enterprise Networking organization to lead NVIDIA's Network Reliability Engineering team. From Data Centers to Contract Manufacturing sites, this organization builds them all.

In this role you will be responsible for NVIDIA’s global network operations, ensuring reliability, scalability and efficiency goals are defined and met. You will lead a team of network reliability engineers to bring in a data-driven approach to operations with a focus on observability, well-defined success metrics, and continuous improvement. You will lead design and automation of operations, provide architectural input based on outage patterns and observability trends, and be the keeper of excellence in networking. The successful candidate will translate strategic plans into incremental delivery of business impact and will build strong teams that partner with engineering and operations across NVIDIA. The leader will join and lead global operations across several countries, covering all of NVIDIA’s data centers, labs, super labs, offices and contract manufacturing sites.

Responsibilities

Mature the current support model and processes toward a data-driven, automated SRE model.
Build and grow an in-house team of reliability experts for networking support and operations from existing outsourced SMEs; provide leadership, direction, and strategy for a growing team.
Set the technical vision, strategy, and roadmap for network operations in partnership with key infrastructure and partner teams.
Establish run books, conduct regular training sessions, and ensure networks are built to be self-healing working across Network Architecture, Network Engineering and partner teams.
Analyze RCAs from events and incidents and work with AI operations to enrich observability tooling for a better full-stack view of the network to applications.
Influence the architecture of NVIDIA networks both on-premises and in cloud environments.

Requirements

Bachelor’s degree in Computer Science, a related technical field, or equivalent experience.
12+ years overall experience with system design, network architecture, network engineering, and network operations.
7+ years of leadership experience building and growing geographically distributed teams while aligning to global standards.
Ability to perform technical deep-dives into code, networking, operating systems, and storage.
Strong structured thinking, problem-solving, and exceptional communication skills to engage with executive leadership and peer SMEs.
Ability to identify trends and promote cross-product solutions that address challenges efficiently.

Ways to stand out

Experience transforming network operations using software-driven methods.
Experience in a hyperscale cloud service provider (public-facing or not).
Knowledge of SRE principles (observability, SLOs, SLIs, logging, etc.).
Experience with software interface design and documentation for less technical end-users.

Compensation & Benefits

Base salary range: 268,000 USD - 408,250 USD (final base will be determined by location, experience, and comparable pay).
Eligible for equity and benefits (link to NVIDIA benefits provided in original posting).

Other details

Applications for this job will be accepted at least until November 9, 2025.
NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment. The company does not discriminate on the basis of characteristics protected by law.