Solutions Architect - Generative AI

at Nvidia

📍 Santa Clara, United States

USD 168,000-322,000 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Marketing @ 4 Docker @ 4 Kubernetes @ 4 Python @ 4 GCP @ 4 MLOps @ 4 Leadership @ 4 AWS @ 4 Azure @ 4 Communication @ 4 Technical Leadership @ 4 LLM @ 4 ChatGPT @ 7 CUDA @ 4 GPU @ 7

Details

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation fueled by phenomenal technology and amazing people. Today, NVIDIA is tapping into the unlimited potential of AI to define the next era of computing and transform industries.

Responsibilities

Architect end-to-end generative AI applications with a focus on Large Language Model (LLM) deployment and Retrieval-Augmented Generation (RAG) workflows.
Use advanced Python programming skills at both application and infrastructure levels.
Provide technical leadership and guidance on training LLMs and implementing RAG-based solutions.
Collaborate with NVIDIA’s Marketing Team and AI Development Team to deliver tailored AI solutions.
Work closely with globally dispersed development, MLOps, product, engineering, and business teams.
Implement strategies for efficient AI workflows using NVIDIA hardware and software platforms.
Lead workshops and design sessions focused on generative AI solutions.
Design and implement RAG workflows to enhance content generation and information retrieval.
Partner with engineering and product teams to evolve generative AI software.
Integrate RAG workflows into web and platform applications.

Requirements

Master's or Ph.D. in Computer Science, AI, or equivalent experience in building scalable AI solutions.
8+ years of hands-on technical experience including generative AI.
Advanced proficiency in Python programming.
Knowledge of agentic frameworks and multi-agent applications (e.g. Langchain, Langgraph).
Experience or understanding of NVIDIA hardware/software: CUDA, Triton, TensorRT, NeMo, RAPIDS.
Proven success deploying and optimizing LLM models for production inference.
Deep understanding of state-of-the-art language models (Llama, Mistral, ChatGPT, Claude, Gemini).
Expertise in training and fine-tuning LLMs using NVIDIA NeMo and other frameworks.
Strong knowledge of cloud and datacenter GPU systems.
Excellent communication and collaboration skills.
Experience leading workshops and training sessions.

Ways To Stand Out

Experience deploying LLMs on cloud platforms (AWS, Azure, GCP) and on-premises.
Familiarity with agentic models/frameworks.
Experience with observability and evaluation tools.
Working knowledge of containerization (Docker) and orchestration tools (ECS, Kubernetes).
Hands-on experience with NVIDIA GPU technologies and cluster management.
Ability to design scalable workflows for LLM training and inference on GPU clusters.

Benefits

Competitive salary range: 168,000 - 322,000 USD depending on location and experience.
Eligibility for equity and benefits.
Commitment to diversity and equal opportunity employment.