Senior On-Device Model Inference Optimization Engineer
at Nvidia
USD 184,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 7 Communication @ 7 PyTorch @ 4 CUDA @ 4Details
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today the company is tapping into the potential of AI to define the next era of computing where GPUs act as the brains of computers, robots, and self-driving cars. The role focuses on improving the performance and efficiency of AI models for on-device deployment, enabling next-generation autonomous vehicle technology.
Responsibilities
- Develop and implement strategies to optimize AI model inference for on-device deployment.
- Employ techniques such as pruning, quantization, and knowledge distillation to minimize model size and computational demands.
- Optimize performance-critical components using CUDA and C++.
- Collaborate with multi-functional teams to align optimization efforts with hardware capabilities and deployment needs.
- Benchmark inference performance, identify bottlenecks, and implement solutions.
- Research and apply innovative methods for inference optimization.
- Adapt models for diverse hardware platforms and operating systems with varying capabilities.
- Create tools to validate the accuracy and latency of deployed models at scale with minimal friction.
- Recommend and implement model architecture changes to improve the accuracy-latency balance.
Requirements
- MSc or PhD in Computer Science, Engineering, or a related field, or equivalent experience.
- Over 10 years of confirmed experience specializing in model inference and optimization.
- Expertise in modern ML frameworks, particularly PyTorch, ONNX, and TensorRT.
- Proven experience optimizing inference for transformer and convolutional architectures.
- Strong programming proficiency in CUDA, Python, and C++.
- In-depth knowledge of optimization techniques, including quantization, pruning, distillation, and hardware-aware neural architecture search.
- Skilled in building and deploying scalable, cloud-based inference systems.
- Passion for developing efficient, production-ready solutions with a strong focus on code quality and performance.
- Meticulous attention to detail for precision and reliability in safety-critical systems.
- Strong collaboration and communication skills for working across multidisciplinary teams.
Ways to stand out
- Publications or industry experience in optimizing and deploying model inference at scale.
- Hands-on expertise in hardware-aware optimizations and accelerators such as GPUs, TPUs, or custom ASICs.
- Active contributions to open-source projects focused on inference optimization or ML frameworks.
- Experience designing and deploying inference pipelines for real-time or autonomous systems.
Compensation & Benefits
- Base salary ranges provided: $184,000 - $287,500 USD for Level 4; $224,000 - $356,500 USD for Level 5. Final base salary will be determined based on location, experience, and pay of employees in similar positions.
- Eligibility for equity and additional benefits (link to NVIDIA benefits).
Other details
- Applications for this job will be accepted at least until October 10, 2025.
- NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.