Vacancy is archived. Applications are no longer accepted.

Senior AI Cluster Tools Developer

at Nvidia
SENIOR
βœ… Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 6 Go @ 7 Linux @ 4 Python @ 7 TensorFlow @ 4 Networking @ 4 Debugging @ 4 Project Management @ 4 PyTorch @ 4

Details

A key part of NVIDIA's strength is our sophisticated analysis / debugging tools that empower NVIDIA engineers to improve performance and power efficiency of our products and running applications. We are looking for forward-thinking, hard-working, and creative people to join a multifaceted software team with high standards! This software engineering role involves developing tools for GPU Cluster users and admins.

Responsibilities

  • Build internal perf/power profiling and analysis tools and platform for AI workloads at cluster scale
  • Build debugging tools for common encountered problems in GPU cluster
  • Work with our users to build / calibrate perf/power models for next generation hardware or system
  • Partner with architects to propose new hardware features or improve existing features with real world use cases

Requirements

  • BS+ in Computer Science or related (or equivalent experience) and 5+ years of software development
  • Strong software design and implementation ability with Python/Go/C++
  • Good understanding of Deep Learning and AI frameworks like PyTorch, TensorFlow, etc.
  • Knowledge of AI cluster job scheduling, storage management, and networking management
  • Knowledge of Linux kernel
  • Excellent problem-solving skills and project management skills
  • Flexibility for working in an evolving environment with changing requirements

Ways to stand out from the crowd:

  • Proven experience in GPU cluster scale continuous profiling & analysis tools/platforms
  • Solid experience in large AI job troubleshooting and failure detection/recovery
  • Skillful in Deep Learning application performance analysis and optimization
  • Knowledgeable in GPU / CPU architecture and application performance or power efficiency analysis

NVIDIA offers highly competitive salaries and a comprehensive benefits package. We have some of the most brilliant and talented people in the world working for us and, due to unprecedented growth, our world-class engineering teams are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you.

#LI-Hybrid