Senior ML Solutions Architect - Token Factory

at Nebius

📍 United States

USD 215,000-275,000 per year

SENIOR

✅ Remote

Used Tools & Technologies

Machine Learning GenAI

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Docker @ 6 Kubernetes @ 6 DevOps @ 6 Python @ 7 GCP @ 4 MLOps @ 4 Vertex AI @ 4 AWS @ 4 Azure @ 4 Communication @ 4 FastAPI @ 4 Flask @ 4 Git @ 4 API @ 4 LLM @ 4 Cloud Computing @ 4 Generative AI @ 6 AI @ 4 vLLM @ 4 RAG @ 4 TensorRT @ 4

Details

Nebius is leading a new era in cloud computing to serve the global AI economy. We create tools and resources customers need to solve real-world challenges and transform industries without massive infrastructure costs or large in-house AI/ML teams. The company is headquartered in Amsterdam, listed on Nasdaq, and has R&D hubs across Europe, North America, and Israel.

The role

We are seeking an experienced Senior ML Solutions Architect to support customers leveraging Nebius Token Factory's serverless inference platform for open-source LLMs across multiple modalities. You will collaborate with clients to design and implement customized LLM-based solutions, architect scalable AI applications using served models, and work with backend teams to improve the platform to match client needs.

Responsibilities

Design and implement LLM-based solutions using Nebius Token Factory’s inference services to drive business value and support customer goals.
Build production-ready applications leveraging serverless LLM APIs, including multimodal models (text, vision, audio) and domain-specific models.
Provide technical expertise in prompt engineering, RAG architectures, model selection, and inference optimization.
Collaborate with product and engineering teams to surface customer feedback and shape the platform roadmap.
Guide customers in scaling from POC to production with a focus on performance, reliability, and cost efficiency.

Requirements

5+ years of experience in ML/AI systems, with at least 2 years focused on LLMs and generative AI.
Deep knowledge of the LLM ecosystem, including model architectures and fine-tuning approaches.
Hands-on experience with:
- Prompt engineering and LLM pipeline development, including evaluation.
- Agentic frameworks such as Langchain, Langsmith, smolagents, or equivalent.
- Vector databases and RAG implementation patterns.
- Deploying LLM-powered applications using APIs from OpenAI, Anthropic, or open-source models.
Strong Python programming skills.
Excellent communication skills and the ability to explain technical concepts to diverse audiences.

Nice to have / Added bonus

Experience with inference frameworks and libraries (e.g., vLLM, SGLang, TensorRT-LLM, Transformers).
Familiarity with inference optimization techniques such as quantization, batching, caching, and routing.
Experience with multimodal AI models (vision-language, speech).
Proficiency with DevOps tools (Docker, Kubernetes).
Contributions to open-source ML/AI projects.

Preferred tooling

Programming Languages: Python
ML Frameworks and Libraries: vLLM, SGLang, TensorRT-LLM, Transformers, OpenAI/Anthropic SDKs
Frameworks for Agentic Pipelines: Langchain, Langsmith, smolagents, or equivalent
API and Web Frameworks: FastAPI, Flask
MLOps and DevOps tools: Kubernetes (K8s), Docker, Git
Cloud Platforms: AWS (SageMaker, Bedrock), GCP (Vertex AI), Azure (Azure ML)

Benefits

Health Insurance: 100% company-paid medical, dental, and vision coverage for employees and families.
401(k) Plan: Up to 4% company match with immediate vesting.
Parental Leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers.
Remote Work Reimbursement: Up to $85/month for mobile and internet.
Disability & Life Insurance: Company-paid short-term, long-term, and life insurance coverage.
Flexible working arrangements and opportunities for professional growth.

Compensation

We offer competitive salaries, ranging from 215k - 275k OTE (On-Target Earnings) and equity based on experience, skills, and location.

What we offer

Competitive salary and comprehensive benefits package.
Opportunities for professional growth within Nebius.
Flexible working arrangements.
A dynamic and collaborative work environment that values initiative and innovation.