System Engineer

at Nebius
USD 150,000-200,000 per year
MIDDLE
✅ Hybrid

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Linux @ 5 Python @ 5 Bash @ 5 Networking @ 3 Performance Optimization @ 3 Debugging @ 3 Cloud Computing @ 3 GPU @ 3 AI @ 3 InfiniBand @ 3

Details

Nebius is leading a new era in cloud computing to serve the global AI economy. We create the tools and resources our customers need to solve real-world challenges and transform industries, without massive infrastructure costs or the need to build large in-house AI/ML teams. Our employees work at the cutting edge of AI cloud infrastructure alongside experienced leaders and engineers.

Where we work: Headquartered in Amsterdam and listed on Nasdaq, Nebius has a global footprint with R&D hubs across Europe, North America, and Israel. The team includes over 800 employees and more than 400 engineers with expertise across hardware and software engineering and an in-house AI R&D team.

Role overview

Nebius is looking for a System Engineer (Servers Hardware R&D Team) to support expanding North American operations. This position requires occasional on-site presence in data center locations as needed. The role focuses on design, deployment, testing, troubleshooting, and performance optimization of high-performance, GPU-based cloud systems for AI workloads.

Responsibilities

  • Participate in the design, deployment, and maintenance of high-performance cloud systems optimized for AI workloads.
  • Arrange and perform hardware R&D tests and experiments on-site in data center environments.
  • Troubleshoot and resolve complex system issues related to GPUs, networking (InfiniBand, NVLink), PCIe, and server infrastructure.
  • Conduct deep investigations into hardware, software, and networking issues to ensure optimal system performance and reliability.
  • Develop and execute test plans and methodologies for advanced GPU, InfiniBand, and compute systems to benchmark and validate performance.
  • Collaborate closely with cross-functional engineering and operations teams to improve system performance and reliability.
  • Monitor system performance and continuously fine-tune configurations for maximum efficiency.

Requirements

  • Strong knowledge of modern server architecture, particularly in high-performance, GPU-based environments.
  • Hands-on experience with GPUs, networking, NVLink, and PCIe technologies.
  • Proficiency in Linux systems, with experience using Python and Bash for automation and tooling.
  • Demonstrated ability to troubleshoot complex hardware, software, and networking issues.
  • Experience with deep problem investigation, root cause analysis, and performance optimization in cloud or high-performance computing environments.
  • Strong analytical and problem-solving skills with a performance-first mindset.
  • Basic electronics modification skills, including soldering and wiring.

Nice to have

  • Knowledge of the Linux kernel and experience with kernel-level debugging or troubleshooting.
  • Familiarity with electronic measurement equipment such as oscilloscopes and multimeters.

Benefits

  • Health insurance: 100% company-paid medical, dental, and vision coverage for employees and families.
  • 401(k) plan: up to 4% company match with immediate vesting.
  • Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers.
  • Remote work reimbursement: up to $85/month for mobile and internet.
  • Disability & life insurance: company-paid short-term, long-term and life insurance coverage.
  • Competitive salary and comprehensive benefits package, opportunities for professional growth, flexible working arrangements, and a dynamic collaborative environment.

Compensation

  • Base salary range: $150,000 - $200,000 per year, plus quarterly performance bonuses.