Senior Software QA Test Development Engineer

at Nvidia
USD 136,000-264,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 4 Ansible @ 4 CentOS @ 7 Docker @ 4 Jenkins @ 4 Kubernetes @ 4 Linux @ 3 DevOps @ 3 Python @ 4 Java @ 4 GitHub @ 4 CI/CD @ 3 TensorFlow @ 4 JavaScript @ 4 Parallel Programming @ 4 Debugging @ 4 QA @ 4 NLP @ 4 LLM @ 3 PyTorch @ 4 Agile @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA is looking for a Senior Software QA Test Development Engineer to join the platform SWQA team. The role focuses on developing and executing platform test plans for NVIDIA HGX/DGX/MGX platforms, validating servers, OS, firmware, and CUDA software stacks, and building automation and debugging frameworks. The position requires strong Linux and server-level experience, reliability testing, CI/CD and DevOps experience, and familiarity with AI tools and model/LLM benchmarking.

Responsibilities

  • Develop and execute NVIDIA HGX/DGX/MGX platform test plans for servers, OS, firmware and CUDA software stack from design documents.
  • Install and test various system OSes, server firmware, and software stacks.
  • Drive root cause analysis on reliability and validation test failures and identify mitigations.
  • Build and develop/debug server- and OS-level automation front-end and back-end frameworks and tests.
  • Review partner and supplier test results and prescribe additional reliability testing on components, servers, and packaging.
  • Work in an agile software development team with high production quality standards.
  • Manage bug lifecycle and collaborate with cross-functional groups to drive solutions.

Requirements

  • Bachelor’s degree (or equivalent experience) in a STEM field; Master’s degree preferred or equivalent experience.
  • 5+ years of relevant experience (or Master’s degree + experience).
  • Proven experience with OS and server-level automation, CI/CD processes and DevOps.
  • Hands-on experience with Python, SHELL, Ansible, Jenkins, C/C++, Java, JavaScript.
  • Strong server and Linux (Ubuntu, Red Hat, CentOS, SuSE, Fedora, etc.) troubleshooting and debugging experience in bare-metal and KVM/VMWare/Hyper-V environments.
  • Experience with reliability testing, telemetry, scale-out clusters, and test plan development.
  • Knowledge and hands-on experience in model testing and AI tools/frameworks (TensorFlow, PyTorch, Cursor, etc.), NLP and LLM benchmarking.
  • Experience using AI development tools to create test plans, develop test cases and automate test cases.
  • Experience with firmware (FW), BMC/OpenBMC, network protocols, enterprise storage devices, PCIe buses and devices, IO sub-devices, CPU and memory, ACPI, UEFI, Redfish is a huge plus.
  • Experience with GitHub/GitLab/Gerrit, PXE, SLURM, container/orchestration (Docker, Kubernetes, etc.) is a strong plus.

Ways to stand out

  • Prior experience with AI-related tools, LLMs and NLP.
  • Experience working with NVIDIA GPU hardware.
  • Solid understanding of Linux virtualization (KVM, Docker with Kubernetes orchestration).
  • Background in parallel programming, ideally CUDA/OpenCL.

Compensation & Benefits

  • Base salary will be determined by location, experience, and internal pay equity.
  • Base salary ranges provided: Level 3: 136,000 USD - 212,750 USD; Level 4: 168,000 USD - 264,500 USD.
  • Eligible for equity and benefits (see NVIDIA benefits page).

Other details

  • Applications accepted at least until September 28, 2025.
  • NVIDIA is an equal opportunity employer and fosters a diverse work environment.