Manager, Deep Learning Algorithms - Training Framework
at Nvidia
π Santa Clara, United States
USD 224,000-425,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 5 Algorithms @ 3 Hiring @ 3 Mentoring @ 3 Debugging @ 5 API @ 3 LLM @ 3 PyTorch @ 3 GPU @ 3Details
NVIDIA is seeking an Engineering Manager to lead engineering activities for the NeMo Framework team. NeMo Framework is an open-source, scalable and cloud-native framework for researchers and developers working on Large Language Models (LLM) and Multimodal (MM) foundation model pretraining and post-training. The team provides end-to-end model training capabilities including pretraining, reasoning, alignment, customization, evaluation, deployment and tooling to optimize performance and user experience.
Responsibilities
- Plan, schedule, mentor, and lead the execution of projects and activities for the team.
- Collaborate with internal customers to align priorities across business units and coordinate projects across geographic locations.
- Grow and develop a world-class engineering team (hiring, mentoring, career development).
- Contribute to and advance the open-source NeMo Framework.
- Design and implement distributed training algorithms, model parallel paradigms, and model optimizations.
- Define robust APIs, analyze and tune performance, and expand toolkits and libraries for training and deployment workflows.
- Solve large-scale, end-to-end AI training challenges across the model lifecycle: orchestration, data pre-processing, training, tuning, and deployment.
- Work across the software stack at the intersection of computer architecture, libraries, frameworks, and AI applications.
Requirements
- MS, PhD or equivalent experience in Computer Science, AI, Applied Math, or a related field.
- 8+ years of industry experience, including 3+ years of management experience.
- Proven experience leading and scaling high-performing engineering teams, especially across distributed and functional groups.
- Excellent understanding of SDLC practices including architecting, testing, continuous integration, and documentation.
- Experience with AI frameworks (e.g., PyTorch, JAX) and/or inference and deployment environments (e.g., TRTLLM, vLLM, SGLang).
- Proficient in Python programming, software design, debugging, performance analysis, test design and documentation.
- Consistent record of improving AI libraries and contributing engineering innovations.
Ways to stand out
- Hands-on experience in large-scale AI training and deep understanding of compute system concepts (latency/throughput bottlenecks, pipelining, multiprocessing) with strong performance analysis and tuning skills.
- Expertise in distributed computing, model parallelism, and mixed precision training.
- Prior experience with Generative AI techniques applied to LLMs and Multi-Modal learning (text, image, video).
- Knowledge of GPU/CPU architecture and related numerical software.
- Created or contributed to open-source deep learning frameworks.
Compensation & Benefits
- Base salary ranges (determined by location, experience, and pay of similar employees):
- Level 3: 224,000 USD - 356,500 USD
- Level 4: 272,000 USD - 425,500 USD
- Eligible for equity and benefits (see NVIDIA benefits).
Additional info
- Location: Santa Clara, California, United States.
- Applications accepted at least until November 20, 2025.
- NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.