AI Inference Engineer - SF or Palo Alto

USD 190,000-250,000 per year
MIDDLE
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Machine Learning @ 3 TensorFlow @ 3 API @ 3 LLM @ 3 PyTorch @ 3 CUDA @ 3 GPU @ 3

Details

You will work on large-scale deployment of machine learning models for real-time inference.

Responsibilities

  • Develop APIs for AI inference that will be used by both internal and external customers
  • Benchmark and address bottlenecks throughout our inference stack
  • Improve the reliability and observability of our systems and respond to system outages
  • Explore novel research and implement LLM inference optimizations

Qualifications

  • Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
  • Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
  • Understanding of GPU architectures or experience with GPU kernel programming using CUDA

Additional Information

The cash compensation range for this role is $190,000 - $250,000.

Equity may be part of the total compensation package.

Benefits include comprehensive health, dental, and vision insurance for you and your dependents, as well as a 401(k) plan.