Data Center Power Test Architect

at Nvidia
USD 168,000-322,000 per year
SENIOR
✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences

Security @ 4 Jenkins @ 3 Kubernetes @ 4 DevOps @ 4 Python @ 4 CI/CD @ 4 Communication @ 1 Debugging @ 4 Reporting @ 4 QA @ 4 System Architecture @ 7 Compliance @ 4 GPU @ 4 AI @ 4

Details

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today, NVIDIA is focused on AI and next-generation computing. This role is for a Senior Test Architect on the Enterprise Software QA team working on design, construction, optimization, and testing of flagship supercomputers and data center offerings.

Responsibilities

  • Define end-to-end test strategy and own the overall test architecture and validation strategy for power features across multiple NVIDIA platforms (pre-silicon simulation and emulation to post-silicon bring-up and production readiness).
  • Develop test plans aligned with product deliverables and customer use cases; influence early design decisions to optimize testability and automation readiness.
  • Architect scalable test infrastructure: design and implement modular, reusable test frameworks and automation harnesses supporting functional, integration, stress, regression, power, security, and performance testing at scale across hundreds of systems.
  • Own data center power quality metrics: define KPIs (code coverage, system uptime, bug escape rate, validation completeness) and establish dashboards/reporting for data-driven decisions.
  • Lead root cause analysis and debugging across firmware, software, and hardware layers; develop and document debug methodologies and tools.
  • Innovate in lab automation and CI/CD: partner with DevOps and infrastructure teams to enhance test automation pipelines and integrate continuous testing into nightly and pre-merge workflows.
  • Enable productization and customer readiness: validate real-world use cases, customer configurations, and production scenarios; contribute to release gates and sign-off criteria.
  • Mentor and lead software QA engineers and junior test developers; promote quality, innovation, and continuous learning.
  • Use AI-powered tools and copilots to accelerate test development, automate repetitive validation workflows, and streamline debug and root cause analysis.

Requirements

  • B.S./M.S./Ph.D. in Electrical Engineering, Computer Engineering, Computer Science, or related field (or equivalent experience).
  • 10+ years of experience in data center power enablement related to software/firmware testing, with focus on telemetry and power efficiency across systems.
  • Strong knowledge of system architecture, power shelf, baseboard management, hardware and software power features, industry power standards, system interfaces, and embedded controllers.
  • Proven experience designing test frameworks and infrastructure in Python, C/C++ or similar languages.
  • Expertise with platform standards for security, telemetry and manageability (NIST, DMTF, OCP). Hands-on experience with server platform, network, storage, cluster configuration and debugging.
  • Background with platform telemetry, datacenter node lifecycle management/support including CPU/GPU workloads.
  • Proficiency in scripting languages such as Python.
  • Expertise in administering, operating, and configuring Kubernetes and Envoy.
  • Validated experience in CI/CD tools such as GitLab and Jenkins and familiarity with the GitOps model.
  • Experience with lab automation, simulation, HW-in-the-loop testing, and CI/CD pipelines.
  • Strong debugging, problem-solving, and analytical skills.
  • Excellent communication and collaboration skills; experience working in a globally distributed team is a plus.

Ways to Stand Out

  • Experience with NVIDIA platforms (e.g., DGX, HGX, Grace Hopper systems).
  • Exposure to security validation and compliance (e.g., FIPS, BMC security), or thermal/power validation.
  • Prior role as a test architect or technical lead for large-scale datacenter enablement or firmware validation programs.
  • Contributions to open-source testing tools or frameworks; strong knowledge of cloud-scale validation, infrastructure automation, or virtualization.
  • Prior experience using AI tools to create agents, design test plans, identify test gaps, automation and failure analysis.

Compensation & Benefits

  • Base salary ranges: 168,000 USD - 270,250 USD (Level 4) and 200,000 USD - 322,000 USD (Level 5).
  • Eligible for equity and benefits (link provided in original posting).

Additional Information

  • Applications accepted at least until June 26, 2026.
  • This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer committed to an inclusive work environment.