Senior Solutions Architect, Generative AI Inference and Deployment

at Nvidia

📍 Santa Clara, United States

USD 184,000-287,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Kubernetes @ 4 Python @ 7 MLOps @ 4 TensorFlow @ 7 Communication @ 4 Mathematics @ 4 Parallel Programming @ 3 Debugging @ 6 LLM @ 8 PyTorch @ 7 GPU @ 6

Details

NVIDIA is seeking outstanding AI Solutions Architects to assist and support customers building solutions with NVIDIA's newest AI technology. You will become a trusted technical advisor, working on projects and proof-of-concepts focused on inference for Generative AI and Large Language Models (LLMs). You will collaborate with internal teams on performance analysis and modeling of inference software and help customers adopt and build solutions using NVIDIA technology and MLOps solutions.

Responsibilities

Partner with other solution architects, engineering, product and business teams to understand strategies and technical needs and help define high-value solutions.
Dynamically engage with developers, scientific researchers, and data scientists across a range of technical areas.
Strategically partner with lighthouse customers and industry-specific solution partners targeting NVIDIA's computing platform.
Work closely with customers to help them adopt and build creative solutions using NVIDIA technology and MLOps solutions.
Analyze performance and power efficiency of AI inference workloads on Kubernetes.
Some travel to conferences and customers may be required (about 20%).

Requirements

BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering or related fields, or equivalent experience.
8+ years of hands-on experience with deep learning frameworks such as PyTorch and TensorFlow.
Strong fundamentals in programming, optimizations, and software design, especially in Python.
Proficiency in problem-solving and debugging skills in GPU orchestration and Multi-Instance GPU (MIG) management within Kubernetes environments.
Experience with containerization and orchestration technologies, monitoring, and observability solutions for AI deployments.
Excellent knowledge of the theory and practice of LLM and deep learning inference.
Excellent presentation, communication and collaboration skills.

Ways to Stand Out

Prior experience with deep learning training at scale, deploying or optimizing DL inference in production.
Experience with NVIDIA GPUs and software libraries such as NVIDIA NIM, Dynamo, TensorRT, and TensorRT-LLM.
Excellent C/C++ programming skills, including debugging, profiling, code optimization, performance analysis, and test design.
Familiarity with parallel programming and distributed computing platforms.

Compensation and Benefits

Base salary range: 184,000 USD - 287,500 USD (final base salary determined based on location, experience, and pay of employees in similar positions).
Eligible for equity and benefits (see NVIDIA benefits page).

Additional Information

Applications accepted at least until September 6, 2025.
NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.