Network Site Reliability Engineer

at Nvidia
USD 168,000-264,500 per year
SENIOR
βœ… Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Ansible @ 4 Cumulus Linux @ 4 Go @ 6 Grafana @ 3 Linux @ 4 Prometheus @ 3 Python @ 6 Communication @ 7 SRE @ 4 Jira @ 4 ServiceNow @ 4 Debugging @ 4 Salt @ 4

Details

The Enterprise Network Support and SRE team is looking to add a seasoned Technical SRE lead to help actualize the SRE vision for our network infrastructure. This role emphasizes making network operation seamless with a focus on user experience, tackling network automation, observability, documentation, and excellence in operations.

Responsibilities

  • Owning the operational aspect of the network infrastructure to ensure high availability and reliability.
  • Partnering with architecture and deployment teams to ensure new implementations are supportable and align with production standards.
  • Advocating for and implementing automation to reduce toil and enhance operational efficiency.
  • Monitoring network performance, identifying improvement areas, and coordinating enhancements.
  • Collaborating with SMEs to resolve production issues swiftly and maintain customer satisfaction.
  • Identifying and driving operational improvements for sustainable network operations.

Requirements

  • BS degree in Computer Science, Electrical Engineering, or a related technical field, or equivalent experience.
  • Minimum of 8 years industry experience in network site reliability engineering, network automation, operations, or related fields.
  • Experience on campus and data center networks.
  • Familiarity with network management tools: Prometheus, Grafana, Alert Manager, Nautobot/Netbox, BigPanda.
  • Expertise automating networks using Salt, Ansible, or similar frameworks.
  • Proficiency in Python and/or Go.
  • Knowledge of network technologies: TCP/UDP, IPv4/IPv6, Wireless, BGP, VPN, L2 switching, Firewalls, Load Balancers, EVPN, VxLAN, Segment Routing.
  • Experience with ServiceNow and Jira.
  • Linux system fundamentals knowledge is a plus.
  • Strong problem-solving, communication skills, ownership, and drive.

Preferred Qualifications

  • Experience with SNMP, Syslog, Streaming Telemetry for solving operational challenges.
  • Debugging and optimizing code; automating routine tasks.
  • Experience with Mellanox/Cumulus Linux, Palo Alto firewalls, Netscalers, F5 load balancers.
  • Previous SRE experience.

Benefits

  • Competitive base salary within the range.
  • Eligibility for equity and other benefits.
  • Inclusive and diverse work environment.

This role is offered with a hybrid work model.