Senior Manager, GPU Cloud Infrastructure - GeForce Now

at Nvidia
USD 256,000-414,000 per year
SENIOR
✅ On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Ansible @ 4 Cumulus Linux @ 4 Grafana @ 4 Linux @ 4 Prometheus @ 4 IaC @ 4 Terraform @ 4 Distributed Systems @ 8 Networking @ 4 SRE @ 6 GDPR @ 4 Debugging @ 4 GPU @ 4 Observability @ 4 AI @ 4 InfiniBand @ 4

Details

GeForce NOW is the global leader in cloud gaming, dedicated to making high-end play accessible on any device. The team leverages NVIDIA’s premier data centers to stream thousands of games at high resolution and frame rates. This role will lead the design, scaling, and operations of high-performance networking for GPU-based cloud infrastructure to enable cloud gaming workloads, AI/ML training, and real-time inference by delivering ultra-low-latency, high-throughput, and highly reliable interconnects across data centers and cloud environments.

Responsibilities

  • Build and mentor a specialized team of network architects focused on high-performance GPU infrastructure.
  • Oversee the design of intra-cluster and inter-cluster connectivity, utilizing RoCE, Ethernet-based AI fabrics, and high-bandwidth data center interconnects.
  • Drive technical tuning to reduce latency and jitter, increase throughput, and implement congestion control and packet-loss mitigation strategies.
  • Define the roadmap for networking strategies that support gaming, AI/ML training, and real-time inference at scale.
  • Engage with ISPs to optimize low-latency edge networks and ensure seamless connections from data centers to end clients.
  • Implement Infrastructure as Code (IaC) and observability frameworks to automate provisioning, scaling, and real-time cluster health monitoring.
  • Work directly with AI platform teams, hardware vendors, and SRE groups to influence technology direction and vendor selection.
  • Establish protocols for fault tolerance and lead incident response and root cause analysis for complex network issues.

Requirements

  • 12+ years of experience in networking, cloud infrastructure, or distributed systems, with 5+ years managing technical teams.
  • Mastery of data center networking, including Clos/spine-leaf architectures and high-performance fabrics like RDMA, RoCE, or InfiniBand.
  • Hands-on experience with BGP, EVPN/VXLAN, and kernel-level development for routing and switching.
  • Skilled in using Ansible or Terraform for infrastructure automation; experience with monitoring tools such as Prometheus and Grafana.
  • Practical experience designing for large-scale configurations using SR-IOV, Xen virtualization, or Open vSwitch.
  • Bachelor’s or Master’s degree in Computer Science or a related engineering field (or equivalent experience).
  • Ability to ensure infrastructure meets internal policies and regulatory standards such as GDPR.

Ways to stand out

  • Proven success managing networking for large-scale GPU clusters or hyperscale cloud environments.
  • Familiarity with optical networking and high-speed interconnects (e.g., 400G or 800G).
  • Experience debugging and improving code for Mellanox/Cumulus Linux or managing Palo Alto and Netscaler appliances.
  • Strong grasp of streaming telemetry and operational signals (SNMP, Syslog) to proactively resolve complex architectural bottlenecks.
  • Relevant top-tier certifications, such as CCIE or specialized cloud networking designations.

Compensation & Benefits

  • Base salary range: 256,000 USD - 414,000 USD.
  • Eligible for equity and company benefits (link to benefits referenced in the posting).

Other details

  • Applications accepted at least until April 11, 2026.
  • NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity.