Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Python @ 5
Distributed Systems @ 3
People Management @ 5
Communication @ 3
Networking @ 3
Rust @ 5
LLM @ 3
Agile @ 3
GPU @ 3
AI @ 3
vLLM @ 3
TensorRT @ 3
SGLang @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self-driving cars that can understand the world. Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work.
An applied research team within NVIDIA’s Networking Systems & Software Architecture group is solving some of AI’s hardest infrastructure problems. The team builds systems-level software that moves data between GPUs, nodes, and storage at the speed modern AI demands—spanning low-level transport optimization, hardware-software co-design, and communication frameworks that plug directly into production AI stacks. The team's charter expands into emerging domains including quantum computing interconnects. This Engineering Manager role leads a team responsible for that work—owning execution, and setting technical direction. It calls for someone technically strong enough to drive architecture and focused on creating an extraordinary engineering organization!
Responsibilities
- Lead and develop a team of systems and networking engineers building distributed AI communication systems—libraries, frameworks, and system integrations.
- Set the technical roadmap in partnership with principal engineers and architects, balancing near-term delivery with long-term research bets.
- Create a culture of technical excellence and open collaboration. Handle project planning, resource allocation, and delivery timelines across concurrent workstreams.
Requirements
- 8+ overall years of software engineering experience with advanced knowledge in systems software, networking, or distributed systems.
- 3+ years of direct people management.
- BS, MS, PhD or equivalent experience in Computer Science, Computer Engineering, or a related field.
- Ability to scope a problem, set a plan, and deliver results in a fast-paced R&D environment.
- Strong communication skills—comfortable speaking publicly, writing technical documents, and giving candid feedback.
- Good understanding of computer architecture, memory hierarchies, DMA engines, and networking.
- Proficiency in programming languages such as C, C++, Rust and Python.
- Understanding of ML systems concepts—transformer architectures, KV cache mechanics, model parallelism, or distributed training and inference patterns.
Ways to stand out from the crowd
- Knowledge of ML inference frameworks (vLLM, SGLang, TensorRT-LLM) and their communication requirements.
- Familiarity with NVIDIA’s hardware and software ecosystem.
- Experience with agile methodologies adapted for engineering teams dedicated to research.
Compensation and benefits
- Base salary ranges (location, experience, and level dependent):
- Level 2: 184,000 USD - 287,500 USD
- Level 3: 224,000 USD - 356,500 USD
- Eligible for equity and benefits. Link to benefits: https://www.nvidia.com/en-us/benefits/
Other information
- Location: Santa Clara, CA, United States.
- Employment type: Full time.
- Applications for this job will be accepted at least until April 27, 2026.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and values diversity.