Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Marketing @ 3
Ansible @ 6
Kubernetes @ 3
Terraform @ 6
Python @ 3
MLOps @ 3
TensorFlow @ 3
AWS @ 3
Azure @ 3
Communication @ 3
KubeFlow @ 3
PyTorch @ 3
CUDA @ 3
Cloud Computing @ 3
GPU @ 3
Deep Learning @ 3
AI @ 3
OpenCL @ 3
Slurm @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Nebius is leading a new era in cloud computing to serve the global AI economy. We create tools and resources to help customers solve real-world AI/ML challenges at scale, without massive infrastructure costs or large in-house teams. Nebius is headquartered in Amsterdam, listed on Nasdaq, and has R&D hubs across Europe, North America, and Israel.
Role overview
We are seeking a highly skilled, customer-focused Cloud Solutions Architect specializing in cloud infrastructure and MLOps. You will design and implement solutions for clients, act as a trusted technical advisor for ML/AI pipelines, and work at the intersection of infrastructure and AI. You may work remotely from the United States or Canada.
Responsibilities
- Act as a trusted advisor to clients: provide technical expertise, conduct PoCs, run workshops, give presentations, and provide training on GPU cloud technologies and best practices.
- Collaborate with clients to understand business requirements and develop solution architecture that aligns with their needs.
- Design and document Infrastructure-as-Code solutions, documentation, and technical how-tos in collaboration with support engineers and technical writers.
- Help customers optimize pipeline performance and scalability to ensure efficient utilization of cloud resources and Nebius AI services.
- Serve as a single point of expertise for customer scenarios for product, technical support, and marketing teams.
- Assist marketing efforts during events (hackathons, conferences, workshops, webinars, etc.).
Requirements
- 5–10+ years of experience as a cloud solutions architect, system/network engineer, developer, or similar technical role with focus on cloud computing.
- Strong hands-on experience with Infrastructure-as-Code and configuration management tools (preferably Terraform and Ansible).
- Experience with Kubernetes.
- Ability to write code in Python.
- Solid understanding of GPU computing practices for ML training and inference workloads and GPU software stack components (including drivers and libraries such as CUDA, OpenCL).
- Excellent communication skills and a customer-centric mindset.
Nice to have
- Hands-on experience with HPC/ML orchestration frameworks (e.g., Slurm, Kubeflow).
- Hands-on experience with deep learning frameworks (e.g., TensorFlow, PyTorch).
- Solid understanding of the cloud ML tools landscape from industry leaders (NVIDIA, AWS, Azure, Google).
Compensation
We offer competitive salaries, ranging from 225k - 315k OTE (On-Target Earnings) and equity based on experience, skills, and location.
Benefits
- Health insurance: 100% company-paid medical, dental, and vision coverage for employees and families.
- 401(k) plan: Up to 4% company match with immediate vesting.
- Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers.
- Remote work reimbursement: Up to $85/month for mobile and internet.
- Disability & life insurance: Company-paid short-term, long-term, and life coverage.
- Competitive salary and comprehensive benefits, opportunities for professional growth, flexible working arrangements, and a dynamic, collaborative environment.
Locations
- Remote from the United States or Canada.