Used Tools & Technologies
Not specified
Required Skills & Competences ?
Grafana @ 3 Kubernetes @ 3 Prometheus @ 3 DevOps @ 3 AWS @ 3 Networking @ 2 Microservices @ 3 AWS Lambda @ 3 LLM @ 3 OpenTelemetry @ 3 GPU @ 3Details
We’re forming a team of innovators to roll out and enhance AI inference solutions at scale, demonstrating NVIDIA’s GPU technology and Kubernetes. As a Solutions Architect (Inference Focus), you’ll collaborate closely with engineering, DevOps, and customer success teams to foster enterprise AI adoption and introduce generative AI to production.
Responsibilities
- Help customers craft, deploy, and maintain scalable, GPU-accelerated inference pipelines on Kubernetes for large language models (LLMs) and generative AI workloads.
- Enhance performance tuning using TensorRT / TensorRT-LLM, NVIDIA NIM, and Triton Inference Server to improve GPU utilization and model efficiency.
- Collaborate with multi-functional teams (engineering, product) and offer technical mentorship to customers implementing AI at scale.
- Architect zero-downtime deployments, autoscaling (e.g., HPA or equivalent experience with custom metrics), and integration with cloud-native tools (e.g., OpenTelemetry, Prometheus, Grafana).
Requirements
- 5+ years in Solutions Architecture with a proven track record of moving AI inference from proof-of-concept to production on Kubernetes.
- Experience architecting GPU allocation using NVIDIA GPU Operator and NVIDIA NIM Operator; troubleshoot sophisticated GPU orchestration and optimize with Multi-Instance GPU (MIG) to ensure efficient utilization in Kubernetes environments.
- Proficiency with TensorRT-LLM, Triton, and TensorRT for model optimization and serving.
- Demonstrated success optimizing LLMs for low-latency inference in enterprise environments.
- BS or equivalent experience in Computer Science / Engineering.
Ways to stand out
- Prior experience deploying NVIDIA NIM microservices for multi-model inference.
- Serverless inference experience and knowledge of FaaS patterns (e.g., Google Cloud Run, AWS Lambda, NVCF) with NVIDIA GPUs.
- NVIDIA Certified AI Engineer or similar certification.
- Active contributions to Kubernetes SIGs or AI inference projects (e.g., KServe, Dynamo, SGLang or similar).
- Familiarity with networking concepts that support multi-node inference such as MPI, LWS or similar.
Compensation & Benefits
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 148,000 USD - 235,750 USD. You will also be eligible for equity and benefits (see NVIDIA benefits page).
Additional Information
- Applications for this job will be accepted at least until August 8, 2025.
- NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.