Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 4
Software Development @ 4
Python @ 4
Machine Learning @ 4
Debugging @ 7
System Architecture @ 3
Compliance @ 4
CUDA @ 4
GPU @ 4
Deep Learning @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA is looking for a Resiliency and Safety Architect to support the development of GPU and Tegra SoC hardware and software resiliency and safety features. You will be a key member of a team of innovators working on product lines ranging from consumer graphics to self-driving cars and AI, impacting the industry's leading GPUs and SoCs.
Responsibilities
- Collaborate with Software and Hardware teams to architect new safety and resiliency features and guide future development.
- Optimize hardware and software features to improve system robustness, performance, and security.
- Model and analyze RAS metrics such as Failures in Time and Availability; and safety metrics like Diagnostic Coverage and PMHF.
- Run simulations to analyze Architectural Vulnerability Factor and liveness of on-die memory.
- Develop diagnostics software components for resiliency and safety to run on NVIDIA GPUs.
- Participate in testing new and existing resiliency and safety hardware and software features.
- Work on compliance of products with functional safety standards (ISO 26262 and ASPICE). Define requirements, architecture, and design with end-to-end traceability; perform safety analyses (FMEA/DFA/FTA); ensure compliance of software to MISRA and Cert-C standards.
Requirements
- Master's or PhD in Computer Science, Computer Engineering, Electrical Engineering, or closely related degree, or equivalent experience.
- At least 5+ years of relevant experience.
- Familiarity with computer system architecture, microprocessors, and microcontroller fundamentals (caches, buses, DMA, etc.).
- Proficiency in C/C++.
- Scripting and automation experience with Python or similar.
- Understanding of the software development process, from requirements to testing closure and maintenance.
- Experience with resiliency and/or functional safety.
- Strong debugging and analytical skills; excellent interpersonal skills and ability to collaborate with on-site and remote teams.
- Be self-driven and results oriented.
Ways to stand out
- Familiarity with general hardware concepts, Verilog RTL coding and simulations/debug.
- Experience with GPU and SoC architectures and Machine Learning / Deep Learning concepts.
- Programming with CUDA.
- Experience in embedded software development.
Compensation & Other Details
- Base salary range (determined by location, experience, and peer pay):
- Level 4: 184,000 USD - 287,500 USD per year
- Level 5: 224,000 USD - 356,500 USD per year
- You will also be eligible for equity and benefits.
- Applications for this job will be accepted at least until May 11, 2026.
- Employment type: Full time
- Location: Santa Clara, CA, United States
Equal Opportunity
NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. NVIDIA uses AI tools in its recruiting processes.