Senior AI Engineer, NeMo Retriever - Model Optimization and MLOps

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Docker @ 4 Kubernetes @ 4 Python @ 4 MLOps @ 4 Hiring @ 4 Helm @ 4 Performance Optimization @ 4 Microservices @ 4 API @ 4 NLP @ 4 LLM @ 4 PyTorch @ 7 OpenAPI @ 4 GPU @ 4

Details

NVIDIA's technology is at the heart of the AI revolution, powering applications from self-driving cars and robotics to co-pilots and more. Join the NeMo Retriever team to work on intelligent assistants and information retrieval. NeMo NIM provides containers to self-host GPU-accelerated inferencing microservices for pre-trained and customized AI models across clouds, data centers, RTX AI PCs, and workstations. NIM microservices expose industry-standard APIs for simple integration into AI applications and workflows, and are built on pre-optimized inference engines including NVIDIA TensorRT and TensorRT-LLM to optimize latency and throughput.

NeMo Retriever is a collection of NIMs for building multimodal extraction, re-ranking, and embedding pipelines with a focus on accuracy and data privacy. It enables context-aware responses for retrieval-augmented generation (RAG) and Agentic AI workflows. The team is hiring an AI Engineer focused on ML model development, performance optimization, and MLOps, working with Generative AI, LLM/MLLM, and RAG using NVIDIA hardware and software platforms.

Responsibilities

Develop and maintain NIMs that containerize optimized models using OpenAPI standards, primarily using Python or an equivalent performant language.
Work closely with partner teams to understand requirements, build and evaluate POCs, and develop roadmaps for production-level tools.
Enable development of integrated systems (AI Blueprints) that provide a unified, turnkey experience.
Help build and maintain Continuous Delivery pipelines to move changes to production faster and safer while ensuring operational standards.
Provide peer reviews for other specialists, offering feedback on performance, scalability, and correctness.

Requirements

Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field (or equivalent experience).
8+ years of demonstrated experience in a similar or related role.
Strong Python programming expertise and experience with Deep Learning frameworks such as PyTorch.
Experience delivering software in a cloud context and familiarity with cloud infrastructure patterns and processes.
Knowledge of MLOps technologies such as Docker-Compose, containers, Kubernetes, Helm, and data center deployments.
Familiarity with ML libraries and inference tooling, especially PyTorch, TensorRT, and TensorRT-LLM.
In-depth hands-on understanding of NLP, LLM, MLLM, Generative AI, and RAG workflows.
Self-starter mindset, enthusiasm for continuous learning, and the ability to share findings across the team.
Highly motivated, passionate, and curious about new technologies.

Benefits

Competitive base salary (range shown below depending on level and location).
Eligibility for equity and a comprehensive benefits package (see NVIDIA benefits page).
Opportunity to work with leading-edge GPU-accelerated AI platforms and production-grade MLOps tooling.

Compensation Details

Base salary range for Level 4: 184,000 USD - 287,500 USD.
Base salary range for Level 5: 224,000 USD - 356,500 USD.

Additional Information

Applications accepted at least until December 20, 2025.
NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment. NVIDIA does not discriminate on the basis of protected characteristics.