Senior Solutions Architect, HPC and AI

at Nvidia
USD 184,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 4 Kubernetes @ 4 Linux @ 4 DevOps @ 4 Machine Learning @ 4 MLOps @ 4 TensorFlow @ 4 Communication @ 7 Mathematics @ 4 Networking @ 4 IaaS @ 4 Debugging @ 7 API @ 4 PyTorch @ 4 CUDA @ 3 GPU @ 7

Details

NVIDIA is seeking a Field Escalation Solution Architect experienced in validation and debugging of large-scale GPU clusters with a strong focus on performance. As part of the Solution Architecture organization you will work with advanced computing hardware and software, enabling deep learning and machine learning breakthroughs for enterprise customers. The role involves validating and debugging customer cluster performance issues and functional bottlenecks, and leading technical customer engagements around NVIDIA products and technologies. Occasional travel (about 25%) is required for on-site customer visits and conferences.

Responsibilities

  • Stay current with High Performance Computing (HPC), Deep Learning (DL) and Machine Learning (ML) ecosystems.
  • Architect and scale high-performance, distributed AI infrastructure on-premises or in the cloud using NVIDIA GPU systems.
  • Diagnose and resolve problems from bare metal up through the OS, software stack, and application layers.
  • Deliver demos, assist with proofs-of-concept (POCs), and produce technical writing (papers, developer blogs) to share knowledge.
  • Collaborate with executives, engineering, and account teams to address complex customer problems and bring NVIDIA technologies to production in datacenters and cloud environments.
  • Work directly with developers and hardware architects to debug cluster performance issues, identify requirements, and improve workflows.
  • Provide expertise to enable account teams and product engineering to obtain actionable data rapidly.
  • Build custom product demonstrations and POCs that address critical customer business needs.

Requirements

  • BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related engineering field, or equivalent experience.
  • 8+ years of experience with NVIDIA and/or accelerated computing technologies.
  • Platform-level understanding of server architecture, PCIe topology, GPUs, NICs, Linux OS, and kernel drivers.
  • Networking knowledge, including Ethernet, InfiniBand, or other networking protocols.
  • Experience with DevOps on-prem or in cloud environments, including Docker/containers, cloud APIs, IaaS, and data center deployments.
  • Experience deploying, using, and debugging job schedulers such as SLURM and Kubernetes.
  • Deep understanding of dense data center design including compute, storage, networking, cloud APIs, and IaaS.
  • Strong analytical and problem-solving abilities; effective time management and ability to balance multiple tasks.
  • Strong written and verbal communication skills; ability to collaborate across engineering, sales, product, and program management.

Ways to Stand Out

  • Demonstrated experience with NVIDIA Collective Communications Library (NCCL).
  • Strong customer-facing experience.
  • Platform design engineering, coding and proficient debugging skills including C/C++, Linux kernel, virtualization, device drivers, and profilers/performance tools (NSys).
  • Hands-on familiarity with NVIDIA systems/SDKs (e.g., CUDA), NVIDIA networking technologies (RoCE, InfiniBand), switch interconnects, and ARM CPU solutions.
  • Understanding of ML/DL frameworks (TensorFlow, PyTorch), LLMs, MLOps, DevOps, and workflows using cloud technologies, Docker/containers, Kubernetes, cloud APIs, and data center deployments.

Compensation & Benefits

  • Base salary ranges (determined by location, experience, and internal pay):
    • Level 4: 184,000 USD - 287,500 USD per year
    • Level 5: 224,000 USD - 356,500 USD per year
  • Eligible for equity and comprehensive benefits.

Additional Information

  • Travel: approximately 25% for customer and conference visits.
  • Applications for this role are accepted at least until September 2, 2025.
  • NVIDIA is an equal opportunity employer and values diversity across its workforce.