Manager, Deep Learning Algorithms - Training Framework

at Nvidia
USD 224,000-425,500 per year
MIDDLE
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 5 Algorithms @ 3 Hiring @ 3 Mentoring @ 3 Debugging @ 5 API @ 3 LLM @ 3 PyTorch @ 3 GPU @ 3

Details

NVIDIA is seeking an Engineering Manager to lead engineering activities for the NeMo Framework team. NeMo Framework is an open-source, scalable and cloud-native framework for researchers and developers working on Large Language Models (LLM) and Multimodal (MM) foundation model pretraining and post-training. The team provides end-to-end model training capabilities including pretraining, reasoning, alignment, customization, evaluation, deployment and tooling to optimize performance and user experience.

Responsibilities

  • Plan, schedule, mentor, and lead the execution of projects and activities for the team.
  • Collaborate with internal customers to align priorities across business units and coordinate projects across geographic locations.
  • Grow and develop a world-class engineering team (hiring, mentoring, career development).
  • Contribute to and advance the open-source NeMo Framework.
  • Design and implement distributed training algorithms, model parallel paradigms, and model optimizations.
  • Define robust APIs, analyze and tune performance, and expand toolkits and libraries for training and deployment workflows.
  • Solve large-scale, end-to-end AI training challenges across the model lifecycle: orchestration, data pre-processing, training, tuning, and deployment.
  • Work across the software stack at the intersection of computer architecture, libraries, frameworks, and AI applications.

Requirements

  • MS, PhD or equivalent experience in Computer Science, AI, Applied Math, or a related field.
  • 8+ years of industry experience, including 3+ years of management experience.
  • Proven experience leading and scaling high-performing engineering teams, especially across distributed and functional groups.
  • Excellent understanding of SDLC practices including architecting, testing, continuous integration, and documentation.
  • Experience with AI frameworks (e.g., PyTorch, JAX) and/or inference and deployment environments (e.g., TRTLLM, vLLM, SGLang).
  • Proficient in Python programming, software design, debugging, performance analysis, test design and documentation.
  • Consistent record of improving AI libraries and contributing engineering innovations.

Ways to stand out

  • Hands-on experience in large-scale AI training and deep understanding of compute system concepts (latency/throughput bottlenecks, pipelining, multiprocessing) with strong performance analysis and tuning skills.
  • Expertise in distributed computing, model parallelism, and mixed precision training.
  • Prior experience with Generative AI techniques applied to LLMs and Multi-Modal learning (text, image, video).
  • Knowledge of GPU/CPU architecture and related numerical software.
  • Created or contributed to open-source deep learning frameworks.

Compensation & Benefits

  • Base salary ranges (determined by location, experience, and pay of similar employees):
    • Level 3: 224,000 USD - 356,500 USD
    • Level 4: 272,000 USD - 425,500 USD
  • Eligible for equity and benefits (see NVIDIA benefits).

Additional info

  • Location: Santa Clara, California, United States.
  • Applications accepted at least until November 20, 2025.
  • NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.