Network Site Reliability Engineer
at Nvidia
π Santa Clara, United States
USD 168,000-264,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Ansible @ 4 Cumulus Linux @ 4 Go @ 6 Grafana @ 3 Linux @ 4 Prometheus @ 3 Python @ 6 Communication @ 7 SRE @ 4 Jira @ 4 ServiceNow @ 4 Debugging @ 4 Salt @ 4Details
The Enterprise Network Support and SRE team is looking to add a seasoned Technical SRE lead to help actualize the SRE vision for our network infrastructure. This role emphasizes making network operation seamless with a focus on user experience, tackling network automation, observability, documentation, and excellence in operations.
Responsibilities
- Owning the operational aspect of the network infrastructure to ensure high availability and reliability.
- Partnering with architecture and deployment teams to ensure new implementations are supportable and align with production standards.
- Advocating for and implementing automation to reduce toil and enhance operational efficiency.
- Monitoring network performance, identifying improvement areas, and coordinating enhancements.
- Collaborating with SMEs to resolve production issues swiftly and maintain customer satisfaction.
- Identifying and driving operational improvements for sustainable network operations.
Requirements
- BS degree in Computer Science, Electrical Engineering, or a related technical field, or equivalent experience.
- Minimum of 8 years industry experience in network site reliability engineering, network automation, operations, or related fields.
- Experience on campus and data center networks.
- Familiarity with network management tools: Prometheus, Grafana, Alert Manager, Nautobot/Netbox, BigPanda.
- Expertise automating networks using Salt, Ansible, or similar frameworks.
- Proficiency in Python and/or Go.
- Knowledge of network technologies: TCP/UDP, IPv4/IPv6, Wireless, BGP, VPN, L2 switching, Firewalls, Load Balancers, EVPN, VxLAN, Segment Routing.
- Experience with ServiceNow and Jira.
- Linux system fundamentals knowledge is a plus.
- Strong problem-solving, communication skills, ownership, and drive.
Preferred Qualifications
- Experience with SNMP, Syslog, Streaming Telemetry for solving operational challenges.
- Debugging and optimizing code; automating routine tasks.
- Experience with Mellanox/Cumulus Linux, Palo Alto firewalls, Netscalers, F5 load balancers.
- Previous SRE experience.
Benefits
- Competitive base salary within the range.
- Eligibility for equity and other benefits.
- Inclusive and diverse work environment.
This role is offered with a hybrid work model.