Senior On-Device Model Inference Optimization Engineer
at Nvidia
π Santa Clara, United States
USD 184,000-356,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 7 Machine Learning @ 4 Communication @ 7 PyTorch @ 4 CUDA @ 4Details
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today, weβre leveraging AI to define the next era of computing, where GPUs power computers, robots, and self-driving cars that understand the world. As an NVIDIAN, you will work in a diverse and supportive environment to make a lasting impact.
Responsibilities
- Develop and implement strategies to optimize AI model inference for on-device deployment.
- Employ techniques such as pruning, quantization, and knowledge distillation to reduce model size and computational demand.
- Optimize performance-critical components using CUDA and C++.
- Collaborate across teams to align optimization with hardware capabilities and deployment requirements.
- Benchmark inference performance, identify bottlenecks, and implement improvements.
- Research and apply innovative inference optimization methods.
- Adapt models for various hardware platforms and operating systems.
- Create tools to validate accuracy and latency of deployed models at scale.
- Recommend model architecture changes to improve accuracy-latency balance.
Requirements
- MSc or PhD in Computer Science, Engineering, or related field, or equivalent experience.
- Over 5 years specializing in model inference and optimization.
- More than 10 years overall relevant work experience.
- Expertise in machine learning frameworks including PyTorch, ONNX, and TensorRT.
- Proven experience optimizing inference for transformer and convolutional architectures.
- Strong programming skills in CUDA, Python, and C++.
- In-depth knowledge of optimization techniques: quantization, pruning, distillation, hardware-aware neural architecture search.
- Experience building and deploying scalable cloud-based inference systems.
- Passion for efficient, production-ready solutions with focus on code quality and performance.
- Attention to detail for precision and reliability in safety-critical systems.
- Strong collaboration and communication abilities.
- Proactive, diligent with drive to solve complex optimization challenges.
Preferred Qualifications
- Publications or industry experience in large-scale model inference optimization.
- Expertise with hardware-aware optimizations and accelerators (GPUs, TPUs, ASICs).
- Contributions to open-source inference optimization or ML framework projects.
- Experience designing and deploying inference pipelines for real-time or autonomous systems.
Compensation and Benefits
- Base salary range: 184,000 USD - 356,500 USD, dependent on location, experience, and comparable roles.
- Eligibility for equity and additional benefits.
- NVIDIA embraces diversity and equal opportunity employment.