Staff Site Reliability Engineer, Core AI Infrastructure

USD 218,000-256,500 per year
MIDDLE
✅ Remote

Used Tools & Technologies

SRE GenAI

Required Skills & Competences

Security @ 3 Ansible @ 6 Chef @ 6 Docker @ 3 Go @ 3 Kubernetes @ 3 Linux @ 3 Ruby @ 5 Terraform @ 6 Python @ 3 CI/CD @ 3 Leadership @ 3 AWS @ 3 Bash @ 5 Git @ 5 Puppet @ 6 Salt @ 6 Compliance @ 3 Observability @ 3 Generative AI @ 3 AI @ 3

Details

Ready to do the most impactful work of your career? At Coinbase, we are uncompromising on our mission to increase economic freedom. The bar is high, the environment is intense, and we like it that way. This isn't a place for complacency — it's a place to be pushed past your perceived limits. Coinbase is a remote-first, but not remote-only company. Expect to get together quarterly for in-person working sessions called “surges.”

You'll join a high-performing team of engineers driving AI transformation at Coinbase as a Staff Site Reliability Engineer on the IT Operations team. This team builds and scales the infrastructure powering Coinbase's AI products, with direct exposure to senior leadership in a fast-paced, incubator-style environment. You'll own the reliability and automation of critical AI infrastructure, ensuring systems are resilient, observable, and secure at scale.

Responsibilities

  • Own the reliability, monitoring, and incident response lifecycle for AI infrastructure services, including on-call support for AWS deployment pipelines, root cause analysis, and blameless retrospectives.
  • Build automation and tooling to streamline operational IT workflows, eliminate manual tasks, and improve deployment velocity across CI/CD frameworks and Kubernetes environments.
  • Partner with the Coinbase Infrastructure team to extend CI/CD frameworks supporting IT services and enterprise network platforms, and with Security and Compliance to integrate surveillance tooling into deployment pipelines.
  • Strengthen observability and documentation standards across IT engineering by defining metrics, implementing monitoring solutions, and maintaining technical documentation.
  • Develop full-stack applications that power internal AI products and infrastructure using Go or Python.

Requirements

  • 8+ years of experience automating and supporting cloud infrastructure (AWS) and network environments, with hands-on use of infrastructure-as-code tools (Terraform, Ansible, Chef, Puppet, or Salt).
  • Proven experience deploying, managing, and troubleshooting containerized workloads using Docker and Kubernetes in production environments.
  • Proficiency in at least one scripting or programming language (Python, Bash, Ruby, or Go) and version control workflows using Git-based CI/CD pipelines.
  • Track record of leading incident response in environments with strict SLAs, including root cause analysis, blameless retros, and measurable reliability improvements.
  • Utilizes generative AI responsibly, maintaining human oversight to deliver business-ready outputs and drive measurable improvements in workflow efficiency, cost, and quality.

Nice to haves

  • Expertise with Linux, Bash, Ruby, Python and/or Go.
  • Expertise automating EC2 or container deployments with Terraform.
  • Strong network security fundamentals.
  • Experience managing and leveraging log aggregation.
  • Experience working in a highly regulated environment.
  • Experience in a fast-paced, high-growth company.
  • Experience in a remote-first IT environment.

Pay Transparency

Base salary varies by location. Annual base salary range (excluding equity and bonus): $218,025 — $256,500 USD. Total compensation may also include equity, bonus eligibility, and benefits (medical, dental, vision, 401(k)).

Benefits

  • Equity and bonus eligibility
  • Medical, dental, vision
  • 401(k)

Additional information

  • Position ID: P76834
  • Application limit: Candidates may submit a maximum of 3 applications within a 6-month period.
  • Coinbase is an Equal Opportunity Employer and provides accommodations for applicants with disabilities upon request.