Senior Network Operations Engineer - DGX Cloud

at Nvidia
USD 136,000-264,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

System Administration @ 4 Grafana @ 3 Linux @ 4 Prometheus @ 3 Python @ 4 GCP @ 4 AWS @ 4 Azure @ 4 Communication @ 4

Details

NVIDIA is seeking a Senior Network Operations Engineer to support and maintain cloud and datacenter network infrastructures that serve NVIDIA's software stack, including Graphics Drivers, Autonomous Vehicles, and AI platforms. The role focuses on alert remediation, incident triage, vendor engagement, project work (device upgrades, capacity augmentation), and operational improvements in large-scale, multi-vendor network environments and CSP deployments.

Responsibilities

  • Remediate critical alerts within defined SLAs and triage production-impacting network incidents.
  • Participate in 24/7 global shift rotations to provide remote support for network repairs and changes, collaborate across teams, and update customers on status and ticket information.
  • Engage with external vendors to remediate hardware and software issues.
  • Drive operational improvements in change management and daily operations by following and improving procedures.
  • Manage and operate large-scale IP network technologies and infrastructures.
  • Monitor and support the health of on-premises and cloud networks.
  • Utilize and support peering and datacenter interconnect technologies: PNI, Transit, Exchange, Passive DWDM, Wave circuits.
  • Collaborate on workflow enhancements and document best practices.
  • Contribute to tooling and automation for provisioning, monitoring, and managing complex network infrastructures.

Requirements

  • Deep knowledge and experience with TCP/IP, BGP, OSPF, MPLS, IS-IS, VxLAN, EVPN, QoS, GRE, IPsec, DNS, and MACsec.
  • 5+ years of experience in network operations.
  • Strong network troubleshooting skills and creative problem-solving abilities.
  • Proven track record of alert response within defined SLAs and incident management.
  • Experience with one or more cloud provider environments: AWS, Azure, GCP, OCI.
  • Familiarity with network hardware vendors such as Arista, Fortinet, and Juniper.
  • Hands-on experience contributing to tooling and automation for provisioning, monitoring, and managing complex network infrastructures.
  • Bachelor's degree in Computer Science, related technical field, or equivalent experience.
  • Excellent verbal and written communication skills.

Ways to Stand Out

  • Understanding of Mellanox/Cumulus OS and InfiniBand technology.
  • Skilled in Unix/Linux system administration and ability to write/understand Python and Shell scripts to improve operational efficiency in hyperscale environments.
  • Familiarity with monitoring and network management tools such as NetBox/Nautobot, Prometheus, Grafana, and Panoptes.
  • Passion for innovating and investing in cutting-edge technologies.

Compensation & Benefits

  • Base salary ranges (determined by location, experience, and internal pay):
    • Level 3: 136,000 USD - 212,750 USD
    • Level 4: 168,000 USD - 264,500 USD
  • Eligible for equity and benefits (link to NVIDIA benefits referenced).

Additional Information

  • Applications accepted at least until August 24, 2025.
  • NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.