Senior On-Device Model Inference Optimization Engineer

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 7 Machine Learning @ 4 Communication @ 7 PyTorch @ 4 CUDA @ 4

Details

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today, we’re leveraging AI to define the next era of computing, where GPUs power computers, robots, and self-driving cars that understand the world. As an NVIDIAN, you will work in a diverse and supportive environment to make a lasting impact.

Responsibilities

Develop and implement strategies to optimize AI model inference for on-device deployment.
Employ techniques such as pruning, quantization, and knowledge distillation to reduce model size and computational demand.
Optimize performance-critical components using CUDA and C++.
Collaborate across teams to align optimization with hardware capabilities and deployment requirements.
Benchmark inference performance, identify bottlenecks, and implement improvements.
Research and apply innovative inference optimization methods.
Adapt models for various hardware platforms and operating systems.
Create tools to validate accuracy and latency of deployed models at scale.
Recommend model architecture changes to improve accuracy-latency balance.

Requirements

MSc or PhD in Computer Science, Engineering, or related field, or equivalent experience.
Over 5 years specializing in model inference and optimization.
More than 10 years overall relevant work experience.
Expertise in machine learning frameworks including PyTorch, ONNX, and TensorRT.
Proven experience optimizing inference for transformer and convolutional architectures.
Strong programming skills in CUDA, Python, and C++.
In-depth knowledge of optimization techniques: quantization, pruning, distillation, hardware-aware neural architecture search.
Experience building and deploying scalable cloud-based inference systems.
Passion for efficient, production-ready solutions with focus on code quality and performance.
Attention to detail for precision and reliability in safety-critical systems.
Strong collaboration and communication abilities.
Proactive, diligent with drive to solve complex optimization challenges.

Preferred Qualifications

Publications or industry experience in large-scale model inference optimization.
Expertise with hardware-aware optimizations and accelerators (GPUs, TPUs, ASICs).
Contributions to open-source inference optimization or ML framework projects.
Experience designing and deploying inference pipelines for real-time or autonomous systems.

Compensation and Benefits

Base salary range: 184,000 USD - 356,500 USD, dependent on location, experience, and comparable roles.
Eligibility for equity and additional benefits.
NVIDIA embraces diversity and equal opportunity employment.