Senior Solutions Architect, HPC and AI

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Marketing @ 7 Docker @ 4 Kubernetes @ 4 Linux @ 4 DevOps @ 4 Machine Learning @ 4 MLOps @ 4 Data Science @ 4 TensorFlow @ 4 Communication @ 7 Mathematics @ 4 Networking @ 4 IaaS @ 4 Debugging @ 4 API @ 4 PyTorch @ 4 CUDA @ 3 GPU @ 4

Details

NVIDIA is looking for a Field Escalation Solution Architect with experience in validation and debugging of large-scale GPU clusters focused on performance. As part of the Solution Architecture organization, you will work with sophisticated computing hardware and software, driving deep learning and machine learning breakthroughs with NVIDIA’s enterprise customers. Primary responsibilities include validating and debugging customer cluster performance issues and functional bottlenecks, and driving customer technical engagements around NVIDIA products and technologies.

Responsibilities

  • Stay up to date on High Performance Computing (HPC), Deep Learning and Machine Learning ecosystems.
  • Architect and scale high-performance, distributed AI infrastructure on-prem or in the cloud built with NVIDIA GPU supercomputers for new and existing customers.
  • Address and resolve problems from the bare metal level through operating system, software stack, and application level.
  • Deliver demos, assist with proof-of-concepts (POCs), and write papers and developer blogs to share knowledge across teams.
  • Collaborate with executives and engineering to address sophisticated problems and enable NVIDIA technologies in cloud and datacenter environments.
  • Work directly with developers and hardware architects to debug cluster performance issues, identify new requirements, and improve workflows.
  • Support account teams when extra analysis is required to debug customer issues and provide expertise to make account and product engineering teams more effective.
  • Build custom product demonstrations and POCs addressing customers' critical business needs.

Requirements

  • BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or other engineering fields, or equivalent experience.
  • 8+ years of work-related experience in NVIDIA and/or accelerated computing technologies.
  • Platform-level understanding of server architecture, PCIe topology, GPUs, NICs, Linux OS and kernel drivers.
  • Networking experience, including knowledge of Ethernet, InfiniBand or other networking protocols.
  • Experience working with DevOps on-prem or in cloud environments, including Docker/containers, cloud APIs, IaaS and data center deployments.
  • SLURM, Kubernetes, and/or other job scheduler use, deployment, and debugging skills.
  • Deep understanding of dense data center design, including computing, storage, networking, cloud APIs, and IaaS.
  • Effective time management and the ability to balance multiple tasks.
  • Strong analytical and problem-solving skills.
  • Strong written and verbal communication skills; ability to collaborate across engineering, sales, marketing, product, and program management.

Ways to Stand Out

  • Demonstrated NCCL (NVIDIA Collective Communications Library) experience.
  • Excellent customer-facing skills and background.
  • Platform design engineering, coding and proficient debugging skills including experience in C/C++, Linux kernel, virtualization and drivers, profilers/performance analysis tools (NSys).
  • Familiarity with NVIDIA systems/SDKs (e.g. CUDA), NVIDIA networking technologies (e.g., RoCE, InfiniBand), switch interconnects and/or ARM CPU solutions through hands-on experience.
  • Understanding of Deep Learning and Machine Learning frameworks (TensorFlow or PyTorch), LLMs, MLOps, DevOps, and workflows applying cloud technologies, using Docker/containers, Kubernetes, cloud APIs, and data center deployments.

Additional Information

  • Occasional travel required (about 25%) for local on-site visits to customers and data science conferences.
  • Applications are accepted at least until September 2, 2025.
  • NVIDIA is an equal opportunity employer and values diversity.

Compensation & Benefits

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

  • Level 4 base salary range: 184,000 USD - 287,500 USD
  • Level 5 base salary range: 224,000 USD - 356,500 USD

You will also be eligible for equity and benefits.