Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Ansible @ 4 Grafana @ 4 Jenkins @ 4 Kubernetes @ 4 Linux @ 4 Prometheus @ 4 Python @ 7 CI/CD @ 4 Networking @ 4 Perl @ 7 Debugging @ 7 API @ 4 Reporting @ 4 QA @ 6 Robot Framework @ 4 GPU @ 4Details
Our technology has no boundaries. NVIDIA is building modern compute platforms used by scientists, researchers and engineers. At its core, our visual computing technology enables high-performance and energy-efficient computing. We seek a Senior Network Validation Engineer to lead and contribute hands-on to network validation activities within the Datacenter Systems Engineering team. You will work with solutions, network & storage architects, HW system engineers, validation engineers, OEM/ODMs, and AE teams to ensure product validation and test coverage are optimal for Data Center scale AI products.
Responsibilities
- Design validation plans from bare metal to at-scale data center integration tests.
- Debug, triage issues, perform root cause analysis, verify fixes, define new tests, and improve product test plans.
- Configure, administer, troubleshoot, and oversee qualification of Ethernet and InfiniBand networks in large-scale datacenter environments.
- Perform server function & network validations including Ethernet & InfiniBand protocol & system-level reliability testing and end-to-end application tests.
- Design, develop, and maintain automation frameworks and test automation suites, including automated reporting, and increase end-to-end automation coverage with each release cycle.
- Track and coordinate all validation activities from bring-up to production release.
- Collaborate with cross-functional teams (application teams, HW designers, networking team, firmware, security, etc.) to debug HW/SW product issues.
- Provide inputs to architecture teams for next-generation Data Center networking design.
Requirements
- M.S. degree in Engineering/Computer Science or related field (or equivalent experience).
- 10+ years of experience.
- Over 5 years of proven experience in Software Quality Engineering and Network Testing, including contributions to QA strategies and test documentation.
- Strong skills in Python (preferred) or other scripting languages such as Perl and Shell.
- Hands-on experience with Jenkins or similar CI/CD pipelines.
- Strong technical abilities in problem solving, design, coding and debugging.
- Extensive hands-on experience configuring and troubleshooting data center networking, including Layer 2/Layer 3 protocols (VLAN, BGP, EVPN) and spine-leaf topologies; InfiniBand experience is desired.
- Experience with test tools from Ixia or Spirent and working experience in test management.
- Hands-on experience with Unix/Linux operating systems.
- Strong interpersonal, documentation, and multi-tasking abilities.
- Solid foundation and understanding of software engineering practices.
- Excellent design, debugging and problem-solving skills with a strong bias for action, quality and engineering excellence.
Ways to stand out
- CCIE certification (Routing & Switching / Service Provider / Data Center).
- Demonstrated experience with RDMA technologies and related protocols such as InfiniBand or RoCE.
- Knowledge or experience of AI Data Center validation with GPU clusters.
- Experience with REST APIs, Kubernetes, and network automation tools such as Ansible, Jenkins & Robot Framework.
- Experience with IPv6 & telemetry at data center scale and observability tools like Grafana & Prometheus.
Compensation & Benefits
- Base salary range: 160,000 USD - 253,000 USD (determined based on location, experience, and comparable pay).
- Eligible for equity and company benefits (see NVIDIA benefits).
Additional information
- Role is full-time. Applications accepted at least until October 18, 2025.
- NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.