AI Inference Engineer
USD 190,000-250,000 per year
Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Kubernetes @ 3
Python @ 3
Machine Learning @ 3
TensorFlow @ 3
Rust @ 3
API @ 3
LLM @ 3
PyTorch @ 3
CUDA @ 3
GPU @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are looking for an AI Inference engineer to join our growing team. Our current stack is Python, Rust, C++, PyTorch, Triton, CUDA, Kubernetes. You will have the opportunity to work on large-scale deployment of machine learning models for real-time inference.
Responsibilities
- Develop APIs for AI inference that will be used by both internal and external customers
- Benchmark and address bottlenecks throughout our inference stack
- Improve the reliability and observability of our systems and respond to system outages
- Explore novel research and implement LLM inference optimizations
Requirements
- Experience with ML systems and deep learning frameworks (e.g. PyTorch, TensorFlow, ONNX)
- Familiarity with common LLM architectures and inference optimization techniques (e.g. continuous batching, quantization, etc.)
- Understanding of GPU architectures or experience with GPU kernel programming using CUDA
Compensation
The cash compensation range for this role is $190,000 - $250,000. Final offer amounts are determined by multiple factors, including experience and expertise, and may vary from the amounts listed above. Equity may be part of the total compensation package.
Benefits
- Comprehensive health, dental, and vision insurance for you and your dependents
- 401(k) plan