Senior Manager, Network Site Reliability - GeForce Now

at Nvidia
USD 248,000-396,800 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Ansible @ 4 Cumulus Linux @ 4 Grafana @ 4 Linux @ 4 Prometheus @ 4 IaC @ 4 Terraform @ 4 GCP @ 4 Distributed Systems @ 3 Hiring @ 4 Leadership @ 7 AWS @ 4 Azure @ 4 Networking @ 8 SRE @ 4 Debugging @ 4 Compliance @ 4

Details

GeForce Now is seeking a Manager of Network Site Reliability Engineering (SRE) to enhance and operate scalable network infrastructure that delivers a smooth user experience. The role focuses on leading Network SRE efforts to streamline operations, reduce manual work, meet service level objectives (SLOs), and improve observability, automation, and documentation across data centers, cloud environments, and edge locations.

Responsibilities

  • Cultivate and lead a top-performing team of Network Site Reliability Engineers through mentorship, collaboration, accountability, and technical excellence.
  • Manage design, implementation, and maintenance of robust, scalable network infrastructure across data centers, cloud platforms, and edge locations to ensure consistent connectivity and performance.
  • Apply proactive reliability engineering techniques to reduce network disruptions and decrease Mean Time to Recovery (MTTR), improving overall service reliability and user satisfaction.
  • Work closely with Security and Compliance teams to ensure network infrastructure meets regulatory standards and internal policies.
  • Lead initiatives to improve network observability by integrating advanced monitoring and alerting systems; collaborate with cross-functional teams to implement network solutions that support business objectives and enhance user experiences.

Requirements

  • Bachelor’s or Master’s degree in Computer Science or a related field, or equivalent experience.
  • 12+ years of proven experience in host and infrastructure networking.
  • 6+ years in leadership roles managing teams focused on high-performance Software Defined Networking (SDN) solutions.
  • Strong understanding of networking protocols and hands-on experience in kernel development.
  • Experience with routing, switching, load balancers, firewalls, VPNs.
  • Experience with cloud platforms such as AWS, GCP, and Azure.
  • Skilled in Infrastructure as Code (IaC) using tools like Ansible and Terraform.
  • Experience with monitoring and observability tools such as Prometheus, Grafana, and NetBox.
  • Practical experience designing network architectures for cloud and distributed systems, including large-scale configurations, and familiarity with SR-IOV, Xen virtualization, and Open Virtual Switch (OVS) or similar SDN technologies.

Ways to Stand Out

  • Extensive experience managing hybrid cloud environments and large-scale distributed systems.
  • Strong understanding of Site Reliability Engineering concepts, including SLAs, SLOs, and incident management best practices.
  • Proven ability to use operational signals such as SNMP, Syslog, and Streaming Telemetry for issue identification and resolution.
  • Comprehensive knowledge of Open Virtual Switch (OVS) and SR-IOV RDMA.
  • Experience debugging and improving code, automating repetitive tasks, and working with Mellanox/Cumulus Linux, Palo Alto firewalls, and Netscaler load balancers.

Benefits

  • Competitive base salary (see range below), eligibility for equity, and comprehensive benefits.
  • NVIDIA emphasizes autonomy, innovation, and career growth in fields such as deep learning and AI.
  • NVIDIA is an equal opportunity employer and values diversity in hiring and promotion practices.
  • Applications accepted at least until August 8, 2025.

Salary

Base salary range: 248,000 USD - 396,750 USD (determined by location, experience, and internal equity).