Senior Software Development Engineer in Test

at Nvidia

📍 Santa Clara, United States

USD 168,000-270,200 per year

SENIOR

✅ On-site

Used Tools & Technologies

HPC

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Security @ 4 Ansible @ 7 Docker @ 7 Grafana @ 6 Jenkins @ 3 Kubernetes @ 7 Linux @ 6 Prometheus @ 6 Terraform @ 4 Playwright @ 4 CI/CD @ 4 Distributed Systems @ 7 AWS @ 4 Azure @ 4 Selenium @ 4 Thanos @ 6 Debugging @ 4 HTTP @ 6 Reporting @ 4 QA @ 4 AI @ 7 Slurm @ 7

Details

We are seeking a highly skilled and hard-working Senior Test Developer / test engineer to join our multifaceted Enterprise Software QA team. This role offers an outstanding opportunity to leave your mark on the design, construction, optimization and testing of large-scale infrastructure for various foundational NVIDIA unified cloud services and data center offerings. If you are a dedicated engineer with strong expertise in cloud infrastructure and distributed systems and want to apply your skills with AI tools, this role could fit you perfectly.

Responsibilities

Work with development teams on test plans for all layers of the software stack for cloud infrastructure: test planning, execution, reviews, failure analysis and assessing overall quality and risk. Work with customer PMs on software issues including technical feedback from OEMs and CSPs. Develop key benchmarks to track execution and deploy process improvements to improve efficiency.
Leverage AI skills to expedite the test scope, test plan, execution and automation workflows.
Lead NVIDIA Cloud and Data Center bring up activities: validation, reporting, working with engineering to debug issues, providing design input, and adding coverage in different areas.
Design, develop and maintain CI/CD pipelines for continuous testing in cloud environments when needed.
Perform performance, scalability, and reliability testing of cloud services.
Implement and maintain test environments in cloud platforms such as AWS, Azure, or Google Cloud (also mentions OCI).
Supervise infrastructure to alert on significant events, ensuring system performance and reliability.
Coordinate with partner teams to ensure availability of clusters to test on and take the lead in resolving issues.
Ensure quality of cloud products focusing on security, storage, workloads, performance on latest software and firmware components.

Requirements

Master’s or Ph.D. in Computer Science or a related field, or equivalent experience.
Experience with AI development tools used for creating and automating test cases, code coverage, and triaging.
8+ years of hands-on experience in cluster management and related tools, including Docker containers, Slurm, Kubernetes, and Ansible.
2+ years strong experience with cloud infrastructure platforms such as AWS, Azure, Google Cloud, and OCI.
Hands-on experience with network, storage, security, cluster configuration and debugging; cloud infrastructure management tools like Terraform and Ansible.
Expertise in administering, operating, and configuring Kubernetes.
Experience with CI/CD tools such as GitLab and Jenkins and familiarity with the GitOps model.
Proficiency with monitoring tools: Prometheus, Grafana, CloudWatch, and Thanos.
Proficiency in debugging issues involving networks (DHCP, DNS, HTTP), Linux, and containers.

Ways to Stand Out

Familiarity with "Base Command Manager" for managing and monitoring high performance computing.
Experience writing automation for web applications using tools like Selenium or Playwright.

Compensation & Other Information

Base salary range: 168,000 USD - 270,250 USD (determined based on location, experience, and pay of employees in similar positions).
Eligible for equity and benefits (link to NVIDIA benefits provided in original posting).
Applications for this job will be accepted at least until July 6, 2026. This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer and is committed to fostering an inclusive work environment.