AI/ML Specialist Solutions Architect

at Nebius

📍 Canada
📍 United States

USD 225,000-315,000 per year

MIDDLE SENIOR

✅ Remote

Featured

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Docker @ 3 Go @ 3 Kubernetes @ 3 DevOps @ 3 Terraform @ 3 Python @ 3 Java @ 3 Machine Learning @ 7 MLOps @ 7 scikit-learn @ 3 TensorFlow @ 3 Communication @ 3 Git @ 3 Helm @ 3 PyTorch @ 3 Cloud Computing @ 3 GPU @ 5 AI @ 3 Slurm @ 3

Details

Nebius is leading a new era in cloud computing to serve the global AI economy. The company builds tools and resources to help customers solve real-world AI/ML challenges at scale without massive infrastructure costs or the need for large in-house teams. Nebius is headquartered in Amsterdam, listed on Nasdaq, and has R&D hubs across Europe, North America, and Israel.

This role sits in Customer Experience and focuses on supporting AI-focused customers leveraging Nebius services. You will act as a trusted advisor, collaborate with clients to design scalable AI solutions, resolve technical challenges, and manage large-scale AI deployments involving hundreds to thousands of GPUs. The position is remote-friendly and candidates may work from the United States or Canada.

Responsibilities

Design customer-centric solutions that maximize business value and align with strategic goals.
Build and maintain long-term customer relationships to foster trust and ensure satisfaction.
Deliver technical presentations, produce whitepapers, create manuals, and host webinars for audiences with varying technical expertise.
Collaborate with engineering and product teams to prioritize and relay customer feedback.
Support customers in optimizing AI solutions at massive GPU cloud scale and influence the development of Nebius AI Cloud.

Requirements

7–10+ years of experience with cloud technologies in MLOps engineering, Machine Learning engineering, or similar roles.
Strong understanding of ML ecosystems, including models, use cases, and tooling.
Proven experience in setting up and optimizing distributed training pipelines across multi-node and multi-GPU environments.
Hands-on knowledge of frameworks such as PyTorch or JAX.
Excellent verbal and written communication skills.

It is an added bonus if you have:

Expertise in deploying inference infrastructure for production workloads.
Ability to transition ML pipelines from proof-of-concept to scalable production systems.

Preferred tooling and technologies mentioned:

Programming languages: Python, Go, Java, C++
Orchestration: Kubernetes (K8s), Slurm
DevOps tools: Git, Docker, Helm
Infrastructure as Code: Terraform
ML frameworks & libraries: PyTorch, TensorFlow, JAX, HuggingFace, Scikit-learn
GPU hardware referenced: H200, B200, GB200

Benefits

Competitive salary and comprehensive benefits package.
Health insurance: 100% company-paid medical, dental, and vision coverage for employees and families.
401(k) plan: Up to 4% company match with immediate vesting.
Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers.
Remote work reimbursement: Up to $85/month for mobile and internet.
Disability & life insurance: Company-paid short-term, long-term, and life insurance coverage.
Opportunities for professional growth, flexible working arrangements, and equity potential.

Compensation

Competitive salaries, ranging from 225k - 315k OTE (On-Target Earnings). Equity may be offered based on experience, skills, and location.