Senior Software QA Test Development Engineer - Diagnostics
at Nvidia
USD 140,000-270,200 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 4 Ansible @ 4 CentOS @ 7 Docker @ 1 Jenkins @ 4 Kubernetes @ 1 Linux @ 7 DevOps @ 4 Python @ 4 Java @ 4 GitHub @ 1 CI/CD @ 7 TensorFlow @ 4 JavaScript @ 4 Parallel Programming @ 4 Debugging @ 4 QA @ 4 NLP @ 4 LLM @ 4 PyTorch @ 4 Agile @ 4 CUDA @ 4 GPU @ 4Details
NVIDIA is the world leader in GPU computing and positions itself as the 'AI Computing Company.' This role is for a Senior Software QA Test Development Engineer on the platform SWQA team focusing on HGX/DGX/MGX platform diagnostics and reliability testing across servers, OS, firmware and the CUDA software stack. The role requires strong Linux and server-level troubleshooting, experience with automation and CI/CD, and hands-on knowledge of AI frameworks and model testing.
Responsibilities
- Develop and execute NVIDIA HGX/DGX/MGX platform test plans for servers, OS, firmware and CUDA software stack based on design documents.
- Install and test various system OS versions, server firmware and software stacks.
- Drive root cause analysis for reliability and validation test failures; identify root causes and implement mitigations.
- Build, develop, and debug server- and OS-level automation front-end and back-end frameworks and tests.
- Review partner and supplier test results and recommend additional reliability testing for components, servers, and packaging as needed.
- Work within an agile software development team with high production quality standards.
- Manage bug lifecycle and collaborate cross-functionally to drive solutions.
Requirements
- Bachelor’s degree (or equivalent experience) in a STEM field (Science, Technology, Engineering, Math or Physics).
- 5+ years proven experience (or Master’s degree).
- Proven experience in OS and server-level automation, CI/CD processes and DevOps using: Python, SHELL, Ansible, Jenkins, C/C++, Java, JavaScript.
- Strong server and Linux (Ubuntu, RedHat, CentOS, SUSE, Fedora, etc.) troubleshooting and debugging experience in bare-metal and KVM/VMWare/Hyper-V environments.
- Good knowledge and hands-on experience in model testing, AI tools/frameworks (TensorFlow, PyTorch, Cursor and etc.), NLP and LLM benchmarking.
- Experience using AI development tools for test plan creation, test case development and test case automation.
- Strong experience with firmware (FW), BMC/OpenBMC, network protocols, internal/external enterprise storage devices, PCIe buses and devices, IO sub-devices, CPU and memory, ACPI, UEFI spec, and Redfish is a huge plus.
- Proven experience with GitHub/GitLab/Gerrit, PXE, SLURM, and container/orchestration tooling (Kubernetes/Docker) is a huge plus.
Ways to stand out
- Experience with AI-related tools, LLMs and NLP.
- Experience working with NVIDIA GPU hardware.
- Solid understanding of virtualization in Linux (KVM) and container orchestration (Docker with Kubernetes).
- Background in parallel programming, ideally CUDA/OpenCL.
Compensation & Benefits
- Base salary ranges by level: Level 3: 140,000 USD - 224,250 USD; Level 4: 168,000 USD - 270,250 USD. Your base salary will be determined based on location, experience, and pay of employees in similar positions.
- Eligible for equity and company benefits (see NVIDIA benefits).
Additional information
- Applications accepted at least until January 14, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and values diversity; it does not discriminate on legally protected bases.