Senior Site Reliability Engineer, Core AI Infrastructure

USD 186,100-218,900 per year
SENIOR
✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Security @ 4 Ansible @ 4 Go @ 4 Terraform @ 4 Python @ 4 GCP @ 4 Java @ 4 Machine Learning @ 4 Data Science @ 4 Leadership @ 4 AWS @ 4 Bash @ 4 Communication @ 7 Compliance @ 4

Details

Ready to be pushed beyond what you think you’re capable of?

At Coinbase, our mission is to increase economic freedom in the world. It’s a massive, ambitious opportunity that demands the best of us every day, as we build the emerging onchain platform and the future global financial system.

We seek a passionate candidate who believes in the power of crypto and blockchain to update the financial system. Someone eager to leave their mark, relishing working with high-caliber colleagues and actively seeking feedback to continually improve. We want someone who will run towards solving the company’s hardest problems.

The work culture is intense and not for everyone, but ideal if you want to build the future alongside others who excel in their disciplines.

While many roles at Coinbase are remote-first, in-person participation is required periodically, including team and company-wide offsites.

Responsibilities

  • Deploy, configure, and manage AI-powered employee productivity tools and in-house AI solutions.
  • Ensure high availability, reliability, and optimal performance of AI platforms and services.
  • Implement monitoring, alerting, and incident response procedures.
  • Design and implement scalable infrastructure to support AI tool demands; optimize resource utilization and capacity planning.
  • Develop and maintain automation scripts and tools for deployment, monitoring, and maintenance.
  • Support experimental sandbox environments for testing AI solutions.
  • Collaborate with cross-functional teams (Machine Learning, HR, Security, Data Science, Developer Experience) to integrate AI solutions.
  • Provide technical support and troubleshooting for AI-related issues.
  • Adhere to security and privacy policies, ensuring compliance with regulatory requirements.
  • Implement monitoring and metrics to track AI system health and performance; analyze data for improvement.
  • Participate in incident response and develop incident response plans.
  • Contribute to backend development supporting AI tool integration and functionality.
  • Manage AI deployments on public cloud platforms such as AWS and GCP using cloud-native services.
  • Communicate technical information effectively to non-technical audiences, including senior leadership.

Requirements

  • Proven experience as a Site Reliability Engineer or similar role.
  • Strong understanding of AI technologies and platforms.
  • Experience deploying and managing applications in cloud environments (AWS/GCP).
  • Solid backend development skills in Python, Java, or Go.
  • Proficiency in managing and configuring public cloud services for scalability and reliability.
  • Experience with automation tools and scripting (e.g., Ansible, Terraform, Bash, Python).
  • Excellent troubleshooting and problem-solving skills.
  • Strong communication and collaboration abilities.
  • Deep understanding of security and compliance.
  • Experience in highly regulated and fast-paced, high-growth environments.

Benefits

  • Medical, Dental, and Vision Plans with generous employee contributions.
  • Health Savings Account with company contributions.
  • Disability and Life Insurance.
  • 401(k) plan with company match.
  • Wellness Stipend.
  • Mobile/Internet Reimbursement.
  • Connections Stipend.
  • Volunteer Time Off.
  • Fertility Counseling and Benefits.
  • Generous Time Off/Leave Policy.
  • Option to get paid in digital currency.

Pay Range: $186,065 — $218,900 USD (Target annual salary; includes bonus, equity, and benefits)

In-person participation required periodically despite remote-first culture.