Engineering Manager - Site Reliability Engineer

at Coinbase

📍 United States

USD 160,000-220,000 per year

SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Kubernetes @ 4 DevOps @ 4 Kibana @ 3 GCP @ 4 Datadog @ 4 Leadership @ 4 AWS @ 4 Azure @ 4 Communication @ 7 Networking @ 4 SRE @ 4 Load Testing @ 4 Technical Leadership @ 4

Details

Ready to be pushed beyond what you think you’re capable of?

At Coinbase, our mission is to increase economic freedom in the world. It’s a massive, ambitious opportunity that demands the best of us every day, as we build the emerging onchain platform and the future global financial system.

We seek a candidate passionate about crypto and blockchain technology to update the financial system, eager to solve difficult problems, work with high-caliber colleagues, and actively seek feedback.

Our work culture is intense and requires in-person participation throughout the year, with multiple team and company-wide offsites.

Responsibilities

Promote reliability culture across Coinbase.
Help scale system by 10-20x and secure service configurations & secrets by building/enhancing service configuration manager systems.
Reduce customer incidents by building/enhancing Safe Release (canary based deployment systems) capabilities and onboarding thousands of services deploying hundreds daily.
Hire and retain top talent.
Build trust and relationships with cross-functional teams to make embedded SRE programs successful.
Collaborate with engineers, product managers, and leadership to develop strategy and detailed roadmap.
Actively listen to customer feedback and iterate for improvements.
Provide technical leadership in architectural decisions and maintain a culture of high-quality code and processes.
Own team's processes and services ensuring SLA adherence.
Work closely with talent organization to recruit engineers aligned with Coinbase's culture.
Coach direct reports to positively impact the organization and support their growth.

Requirements

Minimum 10+ years of software engineering/SRE experience, including 2+ years of management experience.
Knowledge in SRE, DevOps, incident management, and reliability tooling like Canary, load testing.
Experience with public cloud infrastructure (Kubernetes, Load Balancer, Auto-Scaling), networking basics, observability tools (Datadog), and troubleshooting.
Strong communication and interpersonal skills.
Critical thinker under pressure.
Willingness to understand, debug, and improve any layer of the stack.

Nice to Haves

Experience designing and building reliable, high-throughput, low-latency systems.
Experience with high severity incident management and on-call support.
Familiarity with observability and monitoring systems such as Kibana, Datadog.
Experience with AWS, GCP, Azure, or other cloud environments.
Experience in highly regulated environments.
Experience writing company-facing blog posts and training materials.

Benefits

Medical, dental, and vision insurance for employees and dependents.
Group personal accidental and term life insurance.
Employee Stock Purchase Plan (ESPP).
Wellness stipend, mobile/internet reimbursement, connections stipend.
Learning and development allowance.
Employee assistance program.
Global traveler travel medical policy.
Fertility benefits.
Generous time off/leave policy.