Senior AI Engineer, NeMo Retriever - Model Optimization and MLOps

at Nvidia
USD 184,000-356,500 per year
SENIOR
āœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 4 Kubernetes @ 4 Python @ 4 MLOps @ 4 Hiring @ 4 Helm @ 4 Performance Optimization @ 4 Microservices @ 4 API @ 4 NLP @ 4 LLM @ 4 PyTorch @ 7 OpenAPI @ 4 GPU @ 4

Details

NVIDIA's technology is at the heart of the AI revolution, powering applications from self-driving cars and robotics to co-pilots and more. Join the NeMo Retriever team to work on intelligent assistants and information retrieval. NeMo NIM provides containers to self-host GPU-accelerated inferencing microservices for pre-trained and customized AI models across clouds, data centers, RTX AI PCs, and workstations. NIM microservices expose industry-standard APIs for simple integration into AI applications and workflows, and are built on pre-optimized inference engines including NVIDIA TensorRT and TensorRT-LLM to optimize latency and throughput.

NeMo Retriever is a collection of NIMs for building multimodal extraction, re-ranking, and embedding pipelines with a focus on accuracy and data privacy. It enables context-aware responses for retrieval-augmented generation (RAG) and Agentic AI workflows. The team is hiring an AI Engineer focused on ML model development, performance optimization, and MLOps, working with Generative AI, LLM/MLLM, and RAG using NVIDIA hardware and software platforms.

Responsibilities

  • Develop and maintain NIMs that containerize optimized models using OpenAPI standards, primarily using Python or an equivalent performant language.
  • Work closely with partner teams to understand requirements, build and evaluate POCs, and develop roadmaps for production-level tools.
  • Enable development of integrated systems (AI Blueprints) that provide a unified, turnkey experience.
  • Help build and maintain Continuous Delivery pipelines to move changes to production faster and safer while ensuring operational standards.
  • Provide peer reviews for other specialists, offering feedback on performance, scalability, and correctness.

Requirements

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field (or equivalent experience).
  • 8+ years of demonstrated experience in a similar or related role.
  • Strong Python programming expertise and experience with Deep Learning frameworks such as PyTorch.
  • Experience delivering software in a cloud context and familiarity with cloud infrastructure patterns and processes.
  • Knowledge of MLOps technologies such as Docker-Compose, containers, Kubernetes, Helm, and data center deployments.
  • Familiarity with ML libraries and inference tooling, especially PyTorch, TensorRT, and TensorRT-LLM.
  • In-depth hands-on understanding of NLP, LLM, MLLM, Generative AI, and RAG workflows.
  • Self-starter mindset, enthusiasm for continuous learning, and the ability to share findings across the team.
  • Highly motivated, passionate, and curious about new technologies.

Benefits

  • Competitive base salary (range shown below depending on level and location).
  • Eligibility for equity and a comprehensive benefits package (see NVIDIA benefits page).
  • Opportunity to work with leading-edge GPU-accelerated AI platforms and production-grade MLOps tooling.

Compensation Details

  • Base salary range for Level 4: 184,000 USD - 287,500 USD.
  • Base salary range for Level 5: 224,000 USD - 356,500 USD.

Additional Information

  • Applications accepted at least until December 20, 2025.
  • NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment. NVIDIA does not discriminate on the basis of protected characteristics.