Sr. Site Reliability Engineer, Core AI Infra

at Coinbase

📍 United States

USD 186,100-218,900 per year

SENIOR

✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Security @ 4 Ansible @ 4 Go @ 4 Terraform @ 4 Python @ 4 GCP @ 4 Java @ 4 Machine Learning @ 4 Data Science @ 4 Leadership @ 4 AWS @ 4 Bash @ 4 Communication @ 4 SRE @ 4 Compliance @ 4

Details

Ready to be pushed beyond what you think you’re capable of?

At Coinbase, our mission is to increase economic freedom in the world. It’s a massive, ambitious opportunity that demands the best of us, every day, as we build the emerging onchain platform — and with it, the future global financial system.

Coinbase seeks a passionate individual who believes in the power of crypto and blockchain to update the financial system, eager to solve the company’s hardest problems and excel in a high-caliber, intense work culture.

We are looking for a Site Reliability Engineer (SRE) to join the IT AI Infrastructure team to deploy, manage, and optimize AI-powered productivity tools and in-house AI solutions that enhance employee efficiency at scale.

Responsibilities

Deploy, configure, and manage AI-powered employee productivity tools and in-house AI solutions.
Ensure high availability, reliability, and optimal performance of AI platforms and services, implementing monitoring, alerting, and incident response.
Design and implement scalable infrastructure supporting AI tools and user base, optimizing resource utilization and capacity planning.
Develop and maintain automation scripts and tools for deployment, monitoring, and maintenance.
Collaborate with cross-functional teams (Machine Learning, HR, Security, Data Science, Developer Experience) for development and integration of AI solutions.
Adhere to security and privacy policies; ensure compliance with regulatory requirements.
Implement comprehensive monitoring and metrics; analyze data for improvement.
Participate in incident response and troubleshooting, maintaining incident response plans.
Contribute to backend development supporting AI tools integration and functionality.
Deploy and manage AI solutions on public cloud platforms (AWS/GCP), using cloud-native services.
Communicate technical information effectively to non-technical audiences including senior leadership.

Requirements

Proven experience as a Site Reliability Engineer or similar role.
Strong understanding of AI technologies and platforms.
Experience with deploying and managing cloud applications (AWS/GCP).
Backend development experience with Python, Java, or Go.
Proficiency in managing public cloud services (AWS/GCP) for scalability and reliability.
Experience with automation tools and scripting (Ansible, Terraform, Bash, Python).
Excellent troubleshooting, problem-solving, communication, and collaboration skills.
Strong security and compliance knowledge.
Experience working in highly regulated, fast-paced, high-growth environments.

Benefits

Medical, dental, vision plans with employee contributions.
Health Savings Account with company contributions.
Disability and life insurance.
401(k) plan with company match.
Wellness stipend and mobile/internet reimbursement.
Connections stipend and volunteer time off.
Fertility counseling and benefits.
Generous time off/leave policy.
Option to be paid in digital currency.