Sr. Site Reliability Engineer, Core AI Infra

USD 186,100-218,900 per year
SENIOR
✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Ansible @ 4 Go @ 4 Terraform @ 4 Python @ 4 GCP @ 4 Java @ 4 Machine Learning @ 4 Data Science @ 4 Leadership @ 4 AWS @ 4 Bash @ 4 Communication @ 4 SRE @ 4 Compliance @ 4

Details

Ready to be pushed beyond what you think you’re capable of?

At Coinbase, our mission is to increase economic freedom in the world. It’s a massive, ambitious opportunity that demands the best of us, every day, as we build the emerging onchain platform — and with it, the future global financial system.

Coinbase seeks a passionate individual who believes in the power of crypto and blockchain to update the financial system, eager to solve the company’s hardest problems and excel in a high-caliber, intense work culture.

We are looking for a Site Reliability Engineer (SRE) to join the IT AI Infrastructure team to deploy, manage, and optimize AI-powered productivity tools and in-house AI solutions that enhance employee efficiency at scale.

Responsibilities

  • Deploy, configure, and manage AI-powered employee productivity tools and in-house AI solutions.
  • Ensure high availability, reliability, and optimal performance of AI platforms and services, implementing monitoring, alerting, and incident response.
  • Design and implement scalable infrastructure supporting AI tools and user base, optimizing resource utilization and capacity planning.
  • Develop and maintain automation scripts and tools for deployment, monitoring, and maintenance.
  • Collaborate with cross-functional teams (Machine Learning, HR, Security, Data Science, Developer Experience) for development and integration of AI solutions.
  • Adhere to security and privacy policies; ensure compliance with regulatory requirements.
  • Implement comprehensive monitoring and metrics; analyze data for improvement.
  • Participate in incident response and troubleshooting, maintaining incident response plans.
  • Contribute to backend development supporting AI tools integration and functionality.
  • Deploy and manage AI solutions on public cloud platforms (AWS/GCP), using cloud-native services.
  • Communicate technical information effectively to non-technical audiences including senior leadership.

Requirements

  • Proven experience as a Site Reliability Engineer or similar role.
  • Strong understanding of AI technologies and platforms.
  • Experience with deploying and managing cloud applications (AWS/GCP).
  • Backend development experience with Python, Java, or Go.
  • Proficiency in managing public cloud services (AWS/GCP) for scalability and reliability.
  • Experience with automation tools and scripting (Ansible, Terraform, Bash, Python).
  • Excellent troubleshooting, problem-solving, communication, and collaboration skills.
  • Strong security and compliance knowledge.
  • Experience working in highly regulated, fast-paced, high-growth environments.

Benefits

  • Medical, dental, vision plans with employee contributions.
  • Health Savings Account with company contributions.
  • Disability and life insurance.
  • 401(k) plan with company match.
  • Wellness stipend and mobile/internet reimbursement.
  • Connections stipend and volunteer time off.
  • Fertility counseling and benefits.
  • Generous time off/leave policy.
  • Option to be paid in digital currency.