Senior Site Reliability Engineer, Core AI Infrastructure

at Coinbase

📍 United States

USD 186,100-218,900 per year

SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Security @ 4 Ansible @ 4 Go @ 4 Terraform @ 4 Python @ 4 GCP @ 4 Java @ 4 Data Science @ 4 Leadership @ 4 AWS @ 4 Bash @ 4 Communication @ 7 Compliance @ 4

Details

Ready to be pushed beyond what you think you’re capable of?

At Coinbase, our mission is to increase economic freedom in the world. It’s a massive, ambitious opportunity that demands the best of us, every day, as we build the emerging onchain platform — and with it, the future global financial system.

To achieve our mission, we’re seeking a very specific candidate. We want someone who is passionate about our mission and who believes in the power of crypto and blockchain technology to update the financial system. We want someone who is eager to leave their mark on the world, who relishes the pressure and privilege of working with high caliber colleagues, and who actively seeks feedback to keep leveling up. We want someone who will run towards, not away from, solving the company’s hardest problems.

Our work culture is intense and isn’t for everyone. But if you want to build the future alongside others who excel in their disciplines and expect the same from you, there’s no better place to be.

While many roles at Coinbase are remote-first, we are not remote-only. In-person participation is required throughout the year. Team and company-wide offsites are held multiple times annually to foster collaboration, connection, and alignment. Attendance is expected and fully supported.

Responsibilities

Deploy, configure, and manage AI-powered employee productivity tools and in-house AI built solutions
Ensure high availability, reliability, and optimal performance of AI platforms and services; implement monitoring, alerting, and incident response procedures
Design and implement scalable infrastructure for AI tools, optimize resource utilization, and manage capacity planning
Develop and maintain automation scripts and tools to streamline deployment, monitoring, and maintenance tasks
Collaborate with cross-functional teams (Machine-Learning, HR, Security, Data Science, Developer Experience) to support development and integration of AI solutions
Provide technical support and troubleshooting for AI-related issues
Adhere to security and privacy policies while managing AI tools and ensure regulatory compliance
Implement monitoring and metrics to track AI system performance and health; analyze data for improvements
Participate in incident response and troubleshooting for AI outages or performance issues; develop incident response plans
Contribute to backend development to support AI tool integration and functionality
Deploy and manage AI solutions on public cloud platforms (AWS/GCP), leveraging cloud-native services and best practices
Present technical information effectively to non-technical audiences including senior leadership

Requirements

Proven experience as a Site Reliability Engineer or similar role
Strong understanding of AI technologies and platforms
Experience deploying and managing applications in cloud environments (AWS/GCP)
Solid backend development skills in Python, Java, or Go
Strong proficiency managing and configuring public cloud services for scalability and reliability
Experience with automation tools and scripting (Ansible, Terraform, Bash, Python)
Excellent troubleshooting and problem-solving skills
Strong communication and collaboration skills
Strong security and compliance knowledge
Experience working in highly regulated and high-growth environments

Benefits

Medical, Dental, and Vision Plans with generous employee contributions
Health Savings Account with company contributions
Disability and Life Insurance
401(k) plan with company match
Wellness stipend
Mobile/Internet reimbursement
Connections stipend
Volunteer time off
Fertility counseling and benefits
Generous time off/leave policy
Option to get paid in digital currency