Senior System Reliability Engineer

at Nvidia

📍 Santa Clara, United States

$108,000-218,500 per year

SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Project Management @ 4

Details

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing — with the GPU acting as the brains of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company.” We’re looking to grow our company and build our teams with the most thoughtful people in the world. Join us at the forefront of technological advancement.

GPU Servers are one of the fastest-growing segments for NVIDIA and the Artificial Intelligence industry. As the computational power increases with every GPU generation, developing efficient and reliable systems is an imperative. We are looking for a System Reliability Engineer to join NVIDIA's existing Reliability Engineering team, involved in NVIDIA's diverse system product range specifically Graphics and High-Performance Computing printed circuit boards and Data Center Servers.

Responsibilities

  • Provide expertise in Hardware Reliability Engineering for Electronics/Server Systems (graphics cards, server, rack, cluster) from Concept to End-of-Life phase.
  • Establish, deliver and maintain product reliability standards and metrics for NVIDIA's new system technologies, using existing tools and processes or developing new as required.
  • Participate in product and engineering design reviews, assess the reliability budget of products/designs, and inspire changes that enhance product reliability.
  • Interface and interact with all pertinent engineering groups, suppliers, and partners ensuring the desired reliability is achieved using Design for Reliability (DfR) methods including FMEA and DoE approaches.
  • Define and implement Reliability Plans & Specifications.
  • Provide reliability predictions, along with test plans and methods to access and drive product reliability to the desired levels.
  • Perform and lead appropriate testing with associated failure analysis and recommendations for improving designs and manufacturing.
  • Develop and present methods of correlating reliability test results with actual field performance.

Requirements

  • BS (or equivalent experience) in Engineering, Material Science, Physics, or a related field, MS or PhD preferred.
  • 5+ years in a hardware validation/reliability environment related to PCIE peripherals, graphics cards and servers.
  • Understand power supply, memory, high speed I/O, PCI express, Ethernet and I2C.
  • Hands-on experience in theoretical and practical Reliability concepts as it relates to high-tech electronic enterprise and consumer products.
  • Have a strong command and understanding of statistical concepts/models/analysis and how they relate to product reliability & life analysis.
  • Good verbal and writing skills as well as the ability to communicate at a high level.
  • Self-motivating, independent, and committed to getting things done.
  • Good project management skills and ability to balance multiple simultaneous projects during development and production stages.

With competitive salaries and a generous benefits package, we are widely considered to be one of the technology world’s most desirable employers. If you're a creative and autonomous engineer with a real passion for technology, we want to hear from you. Come build the future with us!

#LI-Hybrid