Used Tools & Technologies
Not specified
Required Skills & Competences ?
System Administration @ 4 Grafana @ 3 Linux @ 4 Prometheus @ 3 Python @ 4 GCP @ 4 AWS @ 4 Azure @ 4 Communication @ 4Details
NVIDIA is looking for a Senior Network Operations Engineer to support and maintain cloud and datacenter network infrastructures that serve the needs across the whole NVIDIA software stack, from Graphics Drivers to Autonomous Vehicles and Artificial Intelligence. The role focuses on remediation of critical alerts within defined SLAs, triage of production-impacting network incidents, collaboration with internal customers and external vendors, and participation in network projects such as device upgrades and capacity augmentations.
Responsibilities
- Remediate critical alerts and incidents within defined SLAs and triage production-impacting network events.
- Participate in 24/7 global shift rotations to provide remote support for network repairs and changes; collaborate across teams and update customers on status and ticket information.
- Engage with external vendors to remediate hardware and software issues.
- Drive operational improvements in change management and daily operations by following and improving procedures.
- Manage and operate large-scale IP network technologies and infrastructures, including L3 underlay networks.
- Utilize skills in Peering and Datacenter interconnect technologies (PNI, Transit, Exchange, Passive DWDM, Wave circuits).
- Monitor and support the network health of on-premises and cloud infrastructures (CSP environments).
- Collaborate on and develop workflow enhancements while documenting best practices; contribute to tooling and automation for provisioning, monitoring, and managing complex network infrastructures.
Requirements
- Deep knowledge and experience of network protocols and technologies: TCP/IP, BGP, OSPF, MPLS, IS-IS, VxLAN, EVPN, QoS, GRE, IPsec, DNS, MACsec.
- 5+ years of experience in network operations.
- Strong network troubleshooting skills and creative problem-solving abilities.
- Proven track record of alert response within defined SLAs and incident management.
- Experience with one or more cloud service provider environments: AWS, Azure, GCP, OCI.
- Familiarity with network equipment/vendors such as Arista, Fortinet, and Juniper.
- Hands-on experience contributing to tooling and automation for provisioning, monitoring, and managing complex network infrastructures.
- Bachelor’s degree in Computer Science, a related technical field, or equivalent experience.
- Excellent verbal and written communication skills.
Ways to Stand Out
- Solid understanding of Mellanox/Cumulus OS and Infiniband technology.
- Skilled in Unix/Linux system administration and ability to write/understand Python and Shell scripts to improve efficiency in hyperscale environments.
- Familiarity with network and observability tooling such as NetBox/Nautobot, Prometheus, Grafana, and Panoptes.
- Passion for innovating and investing in cutting-edge technologies.
Compensation & Benefits
- Base salary ranges by level:
- Level 3: 136,000 USD - 212,750 USD per year
- Level 4: 168,000 USD - 264,500 USD per year
- Eligible for equity and other NVIDIA benefits (see NVIDIA benefits link referenced in original posting).
Additional Information
- Location: Santa Clara, CA, United States.
- Employment type: Full time. The posting does not specify weekly hours beyond full-time expectations; role includes 24/7 global shift rotations.
- Application window open at least until October 20, 2025.
- NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.