Network Site Reliability Engineer

at Nvidia
USD 168,000-264,500 per year
MIDDLE
✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

System Administration @ 3 Ansible @ 3 Cumulus Linux @ 3 Go @ 5 Grafana @ 2 Linux @ 3 Prometheus @ 2 Python @ 5 Technical Proficiency @ 5 SRE @ 3 Jira @ 3 ServiceNow @ 3 Debugging @ 6 Salt @ 3 GPU @ 3

Details

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work.

We are seeking a highly skilled and experienced Network Site Reliability Engineer (SRE) to join our Enterprise Network Operations and SRE team. In this role you will be pivotal in implementing our vision for a reliable and efficient network infrastructure. The role focuses on network automation, observability, documentation, and operational excellence to minimize manual operational tasks and maintain Service Level Objectives (SLOs). You will be expected to proactively identify and mitigate network risks, author knowledge base articles for automation and bots, and conduct blameless postmortems and Root Cause Analyses (RCAs).

Responsibilities

  • Own the operational aspect of the network infrastructure to ensure high availability and reliability; actively work on network incidents and service requests.
  • Partner with architecture and deployment teams to ensure new implementations are supportable and align with production standards.
  • Advocate for and implement automation to reduce toil and improve operational efficiency.
  • Monitor network performance, identify areas for improvement, and collaborate with relevant teams to implement refinements.
  • Collaborate with domain experts across functions to resolve production issues swiftly and effectively, ensuring customer satisfaction.
  • Discover opportunities for operational improvements and work with colleagues to devise solutions that enhance sustainability and excellence in network operations.

Requirements

  • BS degree in Computer Science, Electrical Engineering, or a related technical field, or equivalent experience.
  • A minimum of 10 years of industry practice in network operations or related fields concentrating on automation & site reliability engineering. Familiarity with both enterprise and data center networks is critical.
  • Strong network fundamentals and experience debugging complex network issues; expertise in technologies such as TCP/UDP, IPv4/IPv6, Wireless, BGP, ISIS, VPN, L2 switching, Firewalls, Load Balancers, and Data Center Network technologies.
  • Monitoring tools familiarity: Prometheus, Grafana, Alertmanager, Nautobot/NetBox, BigPanda.
  • Network automation expertise using frameworks such as Salt, Ansible, or similar.
  • Process & service tooling experience: ServiceNow, Jira, and foundational knowledge of the ITIL framework.
  • System administration: knowledge of Linux system fundamentals.
  • Strong problem-solving, critical-thinking, and interpersonal skills, with a solid sense of ownership and drive.

Ways to stand out from the crowd

  • Experience taking operational signals (SNMP, Syslog, Streaming Telemetry) and using them to solve operational challenges.
  • Platform exposure such as Mellanox/Cumulus Linux, Palo Alto firewalls, Netscalers, and F5 load balancers.
  • Technical proficiency in programming and scripting (Python, Go), and experience building complex systems to monitor and control network operations beyond basic scripting.
  • Proficiency in advanced network technologies such as VXLAN/EVPN at scale, MPLS, RSVP, Segment Routing, SDWAN, or SASE platforms.

Compensation & Benefits

  • Base salary range: 168,000 USD - 264,500 USD (determined based on location, experience, and internal pay equity).
  • Eligible for equity and NVIDIA benefits (link provided in original posting).

Additional information

  • Application window accepted at least until September 27, 2025.
  • NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.

#LI-Hybrid