Distinguished Software Architect - Deep Learning And HPC Communications

at Nvidia
USD 308,000-471,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Software Development @ 6 Algorithms @ 7 TensorFlow @ 7 Communication @ 4 Networking @ 4 Parallel Programming @ 4 PyTorch @ 7 GPU @ 4

Details

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing, and Visualization. The GPU, NVIDIA's invention, serves as the visual cortex of modern computers and powers innovations from artificial intelligence to autonomous cars.

Responsibilities

  • Research new communication technologies (e.g., expand the GPUDirect technology portfolio) and design new features for communication libraries
  • Propose innovative hardware and software solutions for next-gen platforms, co-designing with GPU, Networking, and Software architects
  • Inspire changes based on quantitative data from proof-of-concepts or technical analysis/modeling
  • Drive adoption of new communication technologies across application verticals
  • Collaborate with internal and external teams, including deep learning researchers and customers

Requirements

  • PhD in Computer Science, Computer Engineering, or related field, or strong equivalent experience
  • 15+ years of relevant experience in academia or industry
  • Expertise in HPC, parallel programming models (MPI, SHMEM), at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC)
  • Deep understanding of high performance networking: network technologies (Infiniband, Ethernet), network design, topologies, debug and performance analysis
  • Strong knowledge in several of: ML/DL fundamentals as related to communications, parallel algorithms, fault tolerance, resiliency, competitive assessments, performance optimizations for large clusters, DL frameworks (PyTorch, TensorFlow)
  • Programming fluency in C or C++ for systems software development
  • Flexibility to work and communicate effectively across HW/SW teams and timezones

Ways To Stand Out

  • Recognized leader in HPC/DL communications with patents, publications, conference talks, and keynotes
  • Influential role in industry standards (MPI, OpenSHMEM) and open source software (PyTorch, UCX, Open MPI)

NVIDIA offers equity and benefits and is committed to fostering a diverse work environment.