Senior ML Solutions Architect - Token Factory
Used Tools & Technologies
Machine Learning GenAIRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 β basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 β daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 β you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 β exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Docker @ 6
Kubernetes @ 6
DevOps @ 6
Python @ 7
GCP @ 4
MLOps @ 4
Vertex AI @ 4
AWS @ 4
Azure @ 4
Communication @ 4
FastAPI @ 4
Flask @ 4
Git @ 4
API @ 4
LLM @ 4
Cloud Computing @ 4
Generative AI @ 6
AI @ 4
vLLM @ 4
RAG @ 4
TensorRT @ 4
- 1-2 β basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 β daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 β you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 β exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Nebius is leading a new era in cloud computing to serve the global AI economy. We create tools and resources customers need to solve real-world challenges and transform industries without massive infrastructure costs or large in-house AI/ML teams. The company is headquartered in Amsterdam, listed on Nasdaq, and has R&D hubs across Europe, North America, and Israel.
The role
We are seeking an experienced Senior ML Solutions Architect to support customers leveraging Nebius Token Factory's serverless inference platform for open-source LLMs across multiple modalities. You will collaborate with clients to design and implement customized LLM-based solutions, architect scalable AI applications using served models, and work with backend teams to improve the platform to match client needs.
Responsibilities
- Design and implement LLM-based solutions using Nebius Token Factoryβs inference services to drive business value and support customer goals.
- Build production-ready applications leveraging serverless LLM APIs, including multimodal models (text, vision, audio) and domain-specific models.
- Provide technical expertise in prompt engineering, RAG architectures, model selection, and inference optimization.
- Collaborate with product and engineering teams to surface customer feedback and shape the platform roadmap.
- Guide customers in scaling from POC to production with a focus on performance, reliability, and cost efficiency.
Requirements
- 5+ years of experience in ML/AI systems, with at least 2 years focused on LLMs and generative AI.
- Deep knowledge of the LLM ecosystem, including model architectures and fine-tuning approaches.
- Hands-on experience with:
- Prompt engineering and LLM pipeline development, including evaluation.
- Agentic frameworks such as Langchain, Langsmith, smolagents, or equivalent.
- Vector databases and RAG implementation patterns.
- Deploying LLM-powered applications using APIs from OpenAI, Anthropic, or open-source models.
- Strong Python programming skills.
- Excellent communication skills and the ability to explain technical concepts to diverse audiences.
Nice to have / Added bonus
- Experience with inference frameworks and libraries (e.g., vLLM, SGLang, TensorRT-LLM, Transformers).
- Familiarity with inference optimization techniques such as quantization, batching, caching, and routing.
- Experience with multimodal AI models (vision-language, speech).
- Proficiency with DevOps tools (Docker, Kubernetes).
- Contributions to open-source ML/AI projects.
Preferred tooling
- Programming Languages: Python
- ML Frameworks and Libraries: vLLM, SGLang, TensorRT-LLM, Transformers, OpenAI/Anthropic SDKs
- Frameworks for Agentic Pipelines: Langchain, Langsmith, smolagents, or equivalent
- API and Web Frameworks: FastAPI, Flask
- MLOps and DevOps tools: Kubernetes (K8s), Docker, Git
- Cloud Platforms: AWS (SageMaker, Bedrock), GCP (Vertex AI), Azure (Azure ML)
Benefits
- Health Insurance: 100% company-paid medical, dental, and vision coverage for employees and families.
- 401(k) Plan: Up to 4% company match with immediate vesting.
- Parental Leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers.
- Remote Work Reimbursement: Up to $85/month for mobile and internet.
- Disability & Life Insurance: Company-paid short-term, long-term, and life insurance coverage.
- Flexible working arrangements and opportunities for professional growth.
Compensation
We offer competitive salaries, ranging from 215k - 275k OTE (On-Target Earnings) and equity based on experience, skills, and location.
What we offer
- Competitive salary and comprehensive benefits package.
- Opportunities for professional growth within Nebius.
- Flexible working arrangements.
- A dynamic and collaborative work environment that values initiative and innovation.