Manager, Deep Learning Algorithms - Training Framework

at Nvidia

📍 Santa Clara, United States

USD 224,000-425,500 per year

MIDDLE

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 5 Algorithms @ 3 Hiring @ 3 Mentoring @ 3 Debugging @ 5 API @ 3 LLM @ 3 PyTorch @ 3 GPU @ 3

Details

NVIDIA is seeking an Engineering Manager to lead engineering activities for the NeMo Framework team. NeMo Framework is an open-source, scalable and cloud-native framework for researchers and developers working on Large Language Models (LLM) and Multimodal (MM) foundation model pretraining and post-training. The team provides end-to-end model training capabilities including pretraining, reasoning, alignment, customization, evaluation, deployment and tooling to optimize performance and user experience.

Responsibilities

Plan, schedule, mentor, and lead the execution of projects and activities for the team.
Collaborate with internal customers to align priorities across business units and coordinate projects across geographic locations.
Grow and develop a world-class engineering team (hiring, mentoring, career development).
Contribute to and advance the open-source NeMo Framework.
Design and implement distributed training algorithms, model parallel paradigms, and model optimizations.
Define robust APIs, analyze and tune performance, and expand toolkits and libraries for training and deployment workflows.
Solve large-scale, end-to-end AI training challenges across the model lifecycle: orchestration, data pre-processing, training, tuning, and deployment.
Work across the software stack at the intersection of computer architecture, libraries, frameworks, and AI applications.

Requirements

MS, PhD or equivalent experience in Computer Science, AI, Applied Math, or a related field.
8+ years of industry experience, including 3+ years of management experience.
Proven experience leading and scaling high-performing engineering teams, especially across distributed and functional groups.
Excellent understanding of SDLC practices including architecting, testing, continuous integration, and documentation.
Experience with AI frameworks (e.g., PyTorch, JAX) and/or inference and deployment environments (e.g., TRTLLM, vLLM, SGLang).
Proficient in Python programming, software design, debugging, performance analysis, test design and documentation.
Consistent record of improving AI libraries and contributing engineering innovations.

Ways to stand out

Hands-on experience in large-scale AI training and deep understanding of compute system concepts (latency/throughput bottlenecks, pipelining, multiprocessing) with strong performance analysis and tuning skills.
Expertise in distributed computing, model parallelism, and mixed precision training.
Prior experience with Generative AI techniques applied to LLMs and Multi-Modal learning (text, image, video).
Knowledge of GPU/CPU architecture and related numerical software.
Created or contributed to open-source deep learning frameworks.

Compensation & Benefits

Base salary ranges (determined by location, experience, and pay of similar employees):
- Level 3: 224,000 USD - 356,500 USD
- Level 4: 272,000 USD - 425,500 USD
Eligible for equity and benefits (see NVIDIA benefits).

Additional info

Location: Santa Clara, California, United States.
Applications accepted at least until November 20, 2025.
NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.