AI Inference Engineer

at Perplexity AI

📍 New York City, United States
📍 Palo Alto, United States
📍 San Francisco, United States

USD 190,000-250,000 per year

MIDDLE

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Kubernetes @ 3 Python @ 3 Machine Learning @ 3 TensorFlow @ 3 Rust @ 3 API @ 3 LLM @ 3 PyTorch @ 3 CUDA @ 3 GPU @ 3

Details

Perplexity is an AI-powered answer engine founded in December 2022. The company has raised over $1B in venture investment and handles more than 780 million queries per month. The team’s objective is to build accurate, trustworthy AI that powers decision-making and assistive AI.

The role involves working on large-scale deployment of machine learning models for real-time inference. The current stack includes: Python, Rust, C++, PyTorch, Triton, CUDA, and Kubernetes.

Responsibilities

Develop APIs for AI inference that will be used by both internal and external customers
Benchmark and address bottlenecks throughout the inference stack
Improve the reliability and observability of systems and respond to system outages
Explore novel research and implement LLM inference optimizations

Requirements

Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization)
Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Practical experience with components of the stack: Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes

Compensation & Benefits

Cash compensation range: $190,000 - $250,000
Equity may be part of the total compensation package
Benefits include comprehensive health, dental, and vision insurance for you and your dependents and a 401(k) plan

Notes

Final offer amounts are determined by multiple factors, including experience and expertise, and may vary from the amounts listed above.