Senior Site Reliability Engineer, Core AI Infrastructure

USD 186,100-218,900 per year
SENIOR
✅ Remote

Used Tools & Technologies

SRE GenAI

Required Skills & Competences

Security @ 4 Ansible @ 6 Chef @ 6 Docker @ 4 Go @ 4 Kubernetes @ 4 Ruby @ 6 Terraform @ 6 Python @ 4 CI/CD @ 4 AWS @ 4 Bash @ 6 Git @ 6 Puppet @ 6 Salt @ 6 Compliance @ 4 Observability @ 4 Generative AI @ 4 AI @ 4

Details

Ready to do the most impactful work of your career? At Coinbase, we are uncompromising on our mission to increase economic freedom. Coinbase is a remote-first (but not remote-only) company; expect quarterly in-person working sessions called “surges.”

You'll join a high-performing team driving AI transformation as a Senior Site Reliability Engineer on the IT Operations team. The team builds and scales the infrastructure powering Coinbase's AI products. You'll own the reliability and automation of critical AI infrastructure, ensuring systems are resilient, observable, and secure at scale.

Responsibilities

  • Own reliability, monitoring, and the incident response lifecycle for AI infrastructure services, including on-call support for AWS deployment pipelines, root cause analysis, and blameless retros.
  • Build automation and tooling to streamline operational IT workflows, eliminate manual tasks, and improve deployment velocity across CI/CD frameworks and Kubernetes environments.
  • Partner with the Coinbase Infrastructure team to extend CI/CD frameworks supporting IT services and enterprise network platforms; partner with Security and Compliance to integrate surveillance tooling into deployment pipelines.
  • Strengthen observability and documentation standards across IT engineering by defining metrics, implementing monitoring solutions, and maintaining technical documentation.
  • Develop full-stack applications that power internal AI products and infrastructure using Go or Python.

Requirements

  • 5+ years of experience automating and supporting cloud infrastructure (AWS) and network environments, with hands-on use of infrastructure-as-code tools (Terraform, Ansible, Chef, Puppet, or Salt).
  • Proven experience deploying, managing, and troubleshooting containerized workloads using Docker and Kubernetes in production.
  • Proficiency in at least one scripting or programming language (Python, Bash, Ruby, or Go) and experience with Git-based CI/CD pipelines.
  • Track record of leading incident response in environments with strict SLAs, including root cause analysis, blameless retros, and measurable reliability improvements.
  • Responsible use of generative AI with human oversight to deliver business-ready outputs and drive measurable workflow, cost, and quality improvements.

Compensation & Benefits

  • Annual base salary range (excluding equity and bonus): $186,065 — $218,900 USD (varies by location).
  • Total compensation may also include equity, bonus eligibility, and benefits (medical, dental, vision, 401(k)).

Additional information

  • Remote (USA) role; Coinbase is remote-first and holds quarterly in-person "surges."
  • Equal Opportunity Employer; accommodations available for applicants with disabilities.
  • Application limit: Candidates may submit a maximum of 3 applications within a 6-month period.