Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Go @ 7
Linux @ 4
Python @ 7
Hiring @ 4
Networking @ 4
Debugging @ 4
Cloud Computing @ 4
Observability @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Nebius is leading a new era in cloud computing for the global AI economy. The company builds cloud infrastructure and tools to help customers solve real-world challenges and run AI/ML workloads without massive infrastructure costs. Headquartered in Amsterdam and listed on Nasdaq, Nebius has R&D hubs across Europe, North America, and Israel, and a team of 800+ employees including 400+ engineers.
Role summary
Nebius is hiring a Senior Software Engineer to design, build, and own backend systems that power metrics, monitor large-scale infrastructure, and develop a comprehensive infrastructure maintenance platform. The role focuses on production systems, system design, reliability, and close collaboration with hardware, networking, and data center operations teams.
Responsibilities
- Design and build services and agents that provide deep visibility into large-scale server fleets and data center engineering systems
- Evolve metrics, aggregation, and alerting pipelines, with a focus on signal quality and reliability
- Design and operate maintenance and remediation systems that enable safe, predictable fleet-wide changes and keep infrastructure healthy
- Investigate production incidents hands-on, including on-host Linux debugging, and drive root-cause fixes
- Collaborate closely with hardware, networking, and data center operations teams to improve reliability
Requirements
- 5+ years of professional software engineering experience
- Strong production experience with Python and Go, or the ability to ramp up quickly
- Solid Linux fundamentals and comfort debugging live systems (on-host Linux debugging)
- Ability to write reliable, maintainable code and dig into complex, ambiguous problems
- Experience building and operating production systems at scale
It will be an added bonus if you have:
- Ubuntu experience, including internal tooling and packaging workflows (e.g., building Debian packages)
- CCNA (Cisco Certified Network Associate) or equivalent networking experience
Benefits
- Health insurance: 100% company-paid medical, dental, and vision coverage for employees and families
- 401(k) plan: up to 4% company match with immediate vesting
- Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers
- Remote work reimbursement: up to $85/month for mobile and internet
- Disability & life insurance: company-paid short-term, long-term and life insurance coverage
- Competitive salary and comprehensive benefits package
- Opportunities for professional growth and flexible working arrangements
Compensation
- Base salary range: $130,000 - $170,000 per year + quarterly performance bonuses
Additional context
- Team: Hardware Automation / Observability-focused backend systems
- Work arrangements: flexible working; remote work reimbursement provided