Used Tools & Technologies
Not specified
Required Skills & Competences ?
Docker @ 2 Grafana @ 3 Kubernetes @ 3 Prometheus @ 3 Python @ 6 Statistics @ 3 GCP @ 5 Machine Learning @ 3 TensorFlow @ 5 AWS @ 5 Azure @ 5 Communication @ 3 Debugging @ 5 LLM @ 3 PyTorch @ 5 GPU @ 3Details
NVIDIA is building the world’s leading AI company and is seeking an experienced Cloud Solution Architect to help customers adopt GPU hardware and software, and to build and deploy machine learning (ML), deep learning (DL), and data analytics solutions on cloud platforms. As part of the Solutions Architecture team you will engage directly with developers, researchers, and data scientists at strategic customers, and work with internal product and engineering teams to drive end-to-end technology solutions using NVIDIA hardware and software.
Responsibilities
- Help cloud customers craft, deploy, and maintain scalable, GPU-accelerated inference pipelines on cloud ML services and Kubernetes for large language models (LLMs) and generative AI workloads.
- Enhance performance tuning using TensorRT / TensorRT-LLM, vLLM, Dynamo, and Triton Inference Server to improve GPU utilization and model efficiency.
- Collaborate with cross-functional teams (engineering, product) and provide technical mentorship to cloud customers implementing AI inference at scale.
- Build custom proof-of-concepts (PoCs) that address customers' critical business needs using NVIDIA hardware and software technology.
- Partner with Sales Account Managers or Developer Relations to identify and secure new business opportunities for NVIDIA ML/DL solutions.
- Prepare and deliver technical content to customers, including presentations, workshops, and demos of purpose-built solutions.
- Conduct regular technical customer meetings for roadmap and feature discussions and establish close technical ties to facilitate rapid resolution of issues.
Requirements
- BS / MS / PhD in Electrical / Computer Engineering, Computer Science, Statistics, Physics, or related engineering fields, or equivalent experience.
- 3+ years in Solutions Architecture with a proven track record of moving AI inference from POC to production in cloud environments (AWS, GCP, or Azure).
- 3+ years hands-on experience with deep learning frameworks such as PyTorch and TensorFlow.
- Excellent knowledge of the theory and practice of LLM and DL inference.
- Strong fundamentals in programming, optimizations, and software design, especially in Python.
- Experience with containerization and orchestration technologies like Docker and Kubernetes, and familiarity with monitoring and observability solutions for AI deployments.
- Knowledge of inference technologies including NVIDIA NIM, TensorRT-LLM, Dynamo, Triton Inference Server, vLLM, etc.
- Proficiency in problem-solving and debugging in GPU environments.
- Excellent presentation, communication, and collaboration skills.
Preferred / Ways to stand out
- AWS, GCP or Azure Professional Solution Architect Certification.
- Experience optimizing and deploying large Mixture-of-Experts (MoE) LLMs at scale.
- Active contributions to open-source AI inference projects (e.g., vLLM, TensorRT-LLM, Dynamo, Triton, or similar).
- Experience with multi-GPU, multi-node inference technologies such as tensor/expert parallelism, disaggregated serving, LWS, MPI, EFA/Infiniband, NVLink/PCIe.
- Experience developing and integrating monitoring and alerting using Prometheus, Grafana, NVIDIA DCGM and using GPU performance analysis tools such as NVIDIA Nsight Systems.
Location & Travel
- Role located in Redmond, WA, United States. Occasional travel is required for on-site customer visits and industry events.
Compensation & Benefits
- Base salary ranges by level: Level 2: 120,000 USD - 189,750 USD; Level 3: 148,000 USD - 235,750 USD (final base salary depends on location, experience, and internal pay equity).
- Eligible for equity and company benefits (see NVIDIA benefits page).
Equal Opportunity
- NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. Applications for this job will be accepted at least until October 21, 2025.