AI Inference Engineer - San Francisco
π San Francisco, United States
USD 190,000-250,000 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Machine Learning @ 3 TensorFlow @ 3 API @ 3 LLM @ 3 PyTorch @ 3 CUDA @ 3Details
You will work on large-scale deployment of machine learning models for real-time inference.
Responsibilities
- Develop APIs for AI inference that will be used by both internal and external customers
- Benchmark and address bottlenecks throughout our inference stack
- Improve the reliability and observability of our systems and respond to system outages
- Explore novel research and implement LLM inference optimizations
Requirements
- Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
- Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
- Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Benefits
- Comprehensive health, dental, and vision insurance for you and your dependents.
- Includes a 401(k) plan.
- Equity is part of the total compensation package.