Senior AI Engineer, NeMo Retriever - Model Optimization and MLOps
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Docker @ 4 Kubernetes @ 4 Python @ 4 MLOps @ 4 Hiring @ 4 Helm @ 4 Performance Optimization @ 4 Microservices @ 4 API @ 4 NLP @ 4 LLM @ 4 PyTorch @ 7 OpenAPI @ 4 GPU @ 4Details
NVIDIA's technology is at the heart of the AI revolution, powering applications from self-driving cars and robotics to co-pilots and more. Join the NeMo Retriever team to work on intelligent assistants and information retrieval. NeMo NIM provides containers to self-host GPU-accelerated inferencing microservices for pre-trained and customized AI models across clouds, data centers, RTX AI PCs, and workstations. NIM microservices expose industry-standard APIs for simple integration into AI applications and workflows, and are built on pre-optimized inference engines including NVIDIA TensorRT and TensorRT-LLM to optimize latency and throughput.
NeMo Retriever is a collection of NIMs for building multimodal extraction, re-ranking, and embedding pipelines with a focus on accuracy and data privacy. It enables context-aware responses for retrieval-augmented generation (RAG) and Agentic AI workflows. The team is hiring an AI Engineer focused on ML model development, performance optimization, and MLOps, working with Generative AI, LLM/MLLM, and RAG using NVIDIA hardware and software platforms.
Responsibilities
- Develop and maintain NIMs that containerize optimized models using OpenAPI standards, primarily using Python or an equivalent performant language.
- Work closely with partner teams to understand requirements, build and evaluate POCs, and develop roadmaps for production-level tools.
- Enable development of integrated systems (AI Blueprints) that provide a unified, turnkey experience.
- Help build and maintain Continuous Delivery pipelines to move changes to production faster and safer while ensuring operational standards.
- Provide peer reviews for other specialists, offering feedback on performance, scalability, and correctness.
Requirements
- Bachelor's or Master's degree in Computer Science, Computer Engineering, or a related field (or equivalent experience).
- 8+ years of demonstrated experience in a similar or related role.
- Strong Python programming expertise and experience with Deep Learning frameworks such as PyTorch.
- Experience delivering software in a cloud context and familiarity with cloud infrastructure patterns and processes.
- Knowledge of MLOps technologies such as Docker-Compose, containers, Kubernetes, Helm, and data center deployments.
- Familiarity with ML libraries and inference tooling, especially PyTorch, TensorRT, and TensorRT-LLM.
- In-depth hands-on understanding of NLP, LLM, MLLM, Generative AI, and RAG workflows.
- Self-starter mindset, enthusiasm for continuous learning, and the ability to share findings across the team.
- Highly motivated, passionate, and curious about new technologies.
Benefits
- Competitive base salary (range shown below depending on level and location).
- Eligibility for equity and a comprehensive benefits package (see NVIDIA benefits page).
- Opportunity to work with leading-edge GPU-accelerated AI platforms and production-grade MLOps tooling.
Compensation Details
- Base salary range for Level 4: 184,000 USD - 287,500 USD.
- Base salary range for Level 5: 224,000 USD - 356,500 USD.
Additional Information
- Applications accepted at least until December 20, 2025.
- NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment. NVIDIA does not discriminate on the basis of protected characteristics.