Used Tools & Technologies
Not specified
Required Skills & Competences ?
Docker @ 3 Grafana @ 3 Kubernetes @ 3 Prometheus @ 3 Python @ 6 Statistics @ 3 GCP @ 5 Machine Learning @ 3 TensorFlow @ 5 AWS @ 5 Azure @ 5 Communication @ 3 Debugging @ 5 LLM @ 3 PyTorch @ 5 GPU @ 3Details
NVIDIA is building the world’s leading AI company. You will join the Solutions Architecture team to help customers adopt GPU hardware and software, and to build and deploy Machine Learning (ML), Deep Learning (DL), and data analytics solutions on cloud platforms. You will engage directly with developers, researchers, and data scientists at strategic customers and collaborate with NVIDIA product and engineering teams to drive end-to-end technology solutions.
Responsibilities
- Help cloud customers craft, deploy, and maintain scalable, GPU-accelerated inference pipelines on cloud ML services and Kubernetes for large language models (LLMs) and generative AI workloads.
- Enhance performance tuning using TensorRT / TensorRT-LLM, vLLM, Dynamo, and Triton Inference Server to improve GPU utilization and model efficiency.
- Collaborate with cross-functional teams (engineering, product) and offer technical mentorship to cloud customers implementing AI inference at scale.
- Build custom proofs-of-concept (PoCs) that apply NVIDIA hardware and software to address customer business needs.
- Partner with Sales Account Managers or Developer Relations Managers to identify and secure new business opportunities for NVIDIA ML/DL products and solutions.
- Prepare and deliver technical content to customers, including presentations and workshops about NVIDIA products and purpose-built solutions.
- Conduct regular technical customer meetings for project/product roadmap discussions, feature planning, and introductions to new technologies. Establish close technical ties to customers to facilitate rapid issue resolution.
Requirements
- BS / MS / PhD in Electrical/Computer Engineering, Computer Science, Statistics, Physics, or other engineering fields, or equivalent experience.
- 3+ years in Solutions Architecture with proven experience taking AI inference from PoC to production in cloud environments (AWS, GCP, or Azure).
- 3+ years of hands-on experience with deep learning frameworks such as PyTorch and TensorFlow.
- Excellent knowledge of the theory and practice of LLM and DL inference.
- Strong fundamentals in programming, optimizations, and software design, with emphasis on Python.
- Experience with containerization and orchestration technologies (Docker, Kubernetes) and monitoring/observability solutions for AI deployments.
- Knowledge of inference technologies such as NVIDIA NIM, TensorRT-LLM, Dynamo, Triton Inference Server, vLLM, etc.
- Proficiency in problem-solving and debugging in GPU environments.
- Excellent presentation, communication, and collaboration skills.
Ways to stand out from the crowd
- AWS, GCP, or Azure Professional Solution Architect certification.
- Experience optimizing and deploying large Mixture-of-Experts (MoE) LLMs at scale.
- Active contributions to open-source AI inference projects (e.g., vLLM, TensorRT-LLM, Dynamo, Triton, or similar).
- Experience with multi-GPU / multi-node inference technologies (Tensor Parallelism, Expert Parallelism, Disaggregated Serving, LWS, MPI, EFA/Infiniband, NVLink/PCIe).
- Experience developing and integrating monitoring and alerting (Prometheus, Grafana, NVIDIA DCGM) and using GPU performance analysis tools (NVIDIA Nsight Systems).
Other details
- Position requires occasional travel for on-site customer visits and industry events.
- Employment type: Full time.
- Base salary ranges (location/experience dependent):
- Level 2: 120,000 USD - 189,750 USD
- Level 3: 148,000 USD - 235,750 USD
- You will also be eligible for equity and benefits (see NVIDIA benefits).
- Applications accepted at least until October 21, 2025.
NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.