AI Inference Engineer
USD 190,000-250,000 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Kubernetes @ 3 Python @ 3 Machine Learning @ 3 TensorFlow @ 3 Rust @ 3 API @ 3 LLM @ 3 PyTorch @ 3 CUDA @ 3 GPU @ 3Details
Perplexity is an AI-powered answer engine founded in December 2022. The company has raised over $1B in venture investment and handles more than 780 million queries per month. The teamβs objective is to build accurate, trustworthy AI that powers decision-making and assistive AI.
The role involves working on large-scale deployment of machine learning models for real-time inference. The current stack includes: Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes.
Responsibilities
- Develop APIs for AI inference that will be used by both internal and external customers
- Benchmark and address bottlenecks throughout the inference stack
- Improve the reliability and observability of systems and respond to system outages
- Explore novel research and implement LLM inference optimizations
Requirements
- Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
- Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization)
- Understanding of GPU architectures or experience with GPU kernel programming using CUDA
- Practical experience with components of the stack: Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes
Compensation & Benefits
- Cash compensation range: $190,000 - $250,000
- Equity may be part of the total compensation package
- Benefits include comprehensive health, dental, and vision insurance for you and your dependents and a 401(k) plan
Notes
Final offer amounts are determined by multiple factors, including experience and expertise, and may vary from the amounts listed above.