Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Linux @ 4
Leadership @ 4
Communication @ 4
Networking @ 4
Technical Leadership @ 4
Cloud Computing @ 4
GPU @ 4
AI @ 4
HPC @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Nebius is leading a new era in cloud computing to serve the global AI economy. We create the tools and resources our customers need to solve real-world challenges and transform industries, without massive infrastructure costs or the need to build large in-house AI/ML teams. Our employees work at the cutting edge of AI cloud infrastructure alongside experienced and innovative leaders and engineers.
Where we work
Headquartered in Amsterdam and listed on Nasdaq, Nebius has a global footprint with R&D hubs across Europe, North America, and Israel. The team includes over 800 employees with more than 400 engineers across hardware and software engineering and an in-house AI R&D team.
Role description
Nebius operates large-scale, GPU-dense AI infrastructure across mission-critical data center environments. As a Senior Delivery Deployment Engineer, you will own the end-to-end delivery, deployment, and production readiness of next-generation GPU platforms inside our data centers. This role sits at the intersection of hardware, Linux systems, and operational execution. You will lead on-site rack bring-up, validate NVIDIA-based AI systems, coordinate repairs, and ensure GB-series infrastructure moves from installation to fully operational production environments. You will collaborate closely with hardware engineering, networking, and infrastructure teams to deploy and stabilize H200 and B200-based GPU systems at scale.
Responsibilities
- Lead end-to-end deployment of GB-series racks within data center environments
- Oversee installation, bring-up, validation, and production readiness of NVIDIA H200 and B200-based servers
- Troubleshoot complex hardware, firmware, Linux OS, and networking issues
- Execute structured testing and validation procedures during deployment
- Develop and maintain basic Linux-based hardware health-check and diagnostic scripts
- Coordinate on-site hardware repairs, part replacements, and vendor escalations
- Drive root cause analysis and ensure corrective actions are implemented
- Manage and prioritize deployment timelines across multiple concurrent rollouts
- Provide technical leadership and guidance to on-site engineers and technicians
- Partner with networking and infrastructure teams to ensure seamless integration
- Document deployment processes, validation standards, and operational runbooks
Requirements
- Strong hands-on experience deploying and operating data center infrastructure
- Deep familiarity with GPU-dense systems, ideally NVIDIA H-series platforms
- Experience working with high-density rack deployments (GB-series or similar)
- Solid Linux experience, including troubleshooting and scripting
- Ability to diagnose issues across hardware, OS, firmware, and network layers
- Experience coordinating field repairs and working directly with hardware vendors
- Proven experience leading technical teams or overseeing field operations
- High ownership mindset and ability to operate in production-critical environments
- Clear communication skills and ability to collaborate across distributed teams
Nice to have
- Experience deploying AI or HPC clusters at scale
- Familiarity with automated provisioning or infrastructure lifecycle systems
- Background in hardware qualification, burn-in testing, or factory validation
- Experience supporting rapid infrastructure expansion
- Exposure to ARM-based or heterogeneous compute environments
Working conditions
- Fully remote position (United States)
- Collaboration with globally distributed engineering and operations teams
Benefits
- Health insurance: 100% company-paid medical, dental, and vision coverage for employees and families
- 401(k) plan: up to 4% company match with immediate vesting
- Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers
- Remote work reimbursement: up to $85/month for mobile and internet
- Disability & life insurance: company-paid short-term, long-term, and life insurance coverage
- Competitive salary and comprehensive benefits package; opportunities for professional growth and flexible working arrangements
Compensation
We offer competitive salaries, ranging from $125k- $180k base + quarterly performance bonuses.