Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Go @ 7
Python @ 7
SQL @ 7
Statistics @ 7
Communication @ 4
Project Management @ 6
Reporting @ 4
Cloud Computing @ 4
GPU @ 4
AI @ 4
HPC @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Nebius is leading a new era in cloud computing to serve the global AI economy. We create tools and resources to help customers solve real-world challenges and transform industries without massive infrastructure costs or large in-house AI/ML teams. Headquartered in Amsterdam and listed on Nasdaq, Nebius has R&D hubs across Europe, North America, and Israel and a global team of engineers working on hardware and software engineering and in-house AI R&D.
The role
As a Senior Technical Program Manager - Data Center Operations, you will drive operational performance across data center IT and infrastructure teams through data-driven insights, structured tracking, and cross-team coordination. The role focuses on operational efficiency, performance measurement, and continuous improvement: defining success metrics, ensuring transparency across operations, and helping teams optimize execution at scale.
Travel: This position may require on-site presence at data centers.
Responsibilities
- Define, implement, and continuously improve KPIs, metrics, and dashboards for data center operations
- Establish tracking frameworks to monitor performance, incidents, and operational efficiency across teams
- Lead operational reviews (weekly/monthly) to identify bottlenecks, inefficiencies, and improvement opportunities
- Ensure visibility and accountability through structured reporting and data-driven insights
- Standardize processes across sites to improve operational consistency, scalability, and efficiency
- Collaborate with stakeholders to align operational performance with business goals and customer SLAs
Requirements
- 5+ years of experience in technical project/program management, preferably in data center, cloud, or infrastructure environments
- Experience with project management methodologies and tools
- Strong experience in building metrics, dashboards, and reporting systems for operations
- Strong technical skills (SQL and/or programming languages such as Python or Go)
- Strong analytical skills with a good foundation in math and statistics
- Proven ability to work across multiple teams and drive alignment and execution without direct authority
- Excellent communication and stakeholder management skills
- A structured, proactive, and results-driven mindset
Nice to have
- Familiarity with ITIL / ITSM processes
- Experience working with GPU clusters, HPC, or cloud infrastructure
- Understanding of data center network traffic patterns (east-west and north-south)
Benefits
Key employee benefits in the US:
- Health insurance: 100% company-paid medical, dental, and vision coverage for employees and families
- 401(k) plan: Up to 4% company match with immediate vesting
- Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers
- Remote work reimbursement: Up to $85/month for mobile and internet
- Disability & life insurance: Company-paid short-term, long-term, and life insurance coverage
What Nebius offers:
- Competitive salary and comprehensive benefits package
- Opportunities for professional growth
- Flexible working arrangements
- A dynamic and collaborative work environment that values initiative and innovation
Compensation
We offer competitive salaries between 110k - 204k plus quarterly bonuses and equity based on your experience.