Senior Site Reliability Engineer, Core AI Infrastructure
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Ansible @ 4 Go @ 4 Terraform @ 4 Python @ 4 GCP @ 4 Java @ 4 Data Science @ 4 Leadership @ 4 AWS @ 4 Bash @ 4 Communication @ 7 Compliance @ 4Details
Ready to be pushed beyond what you think you’re capable of?
At Coinbase, our mission is to increase economic freedom in the world. It’s a massive, ambitious opportunity that demands the best of us, every day, as we build the emerging onchain platform — and with it, the future global financial system.
To achieve our mission, we’re seeking a very specific candidate. We want someone who is passionate about our mission and who believes in the power of crypto and blockchain technology to update the financial system. We want someone who is eager to leave their mark on the world, who relishes the pressure and privilege of working with high caliber colleagues, and who actively seeks feedback to keep leveling up. We want someone who will run towards, not away from, solving the company’s hardest problems.
Our work culture is intense and isn’t for everyone. But if you want to build the future alongside others who excel in their disciplines and expect the same from you, there’s no better place to be.
While many roles at Coinbase are remote-first, we are not remote-only. In-person participation is required throughout the year. Team and company-wide offsites are held multiple times annually to foster collaboration, connection, and alignment. Attendance is expected and fully supported.
Responsibilities
- Deploy, configure, and manage AI-powered employee productivity tools and in-house AI built solutions
- Ensure high availability, reliability, and optimal performance of AI platforms and services; implement monitoring, alerting, and incident response procedures
- Design and implement scalable infrastructure for AI tools, optimize resource utilization, and manage capacity planning
- Develop and maintain automation scripts and tools to streamline deployment, monitoring, and maintenance tasks
- Collaborate with cross-functional teams (Machine-Learning, HR, Security, Data Science, Developer Experience) to support development and integration of AI solutions
- Provide technical support and troubleshooting for AI-related issues
- Adhere to security and privacy policies while managing AI tools and ensure regulatory compliance
- Implement monitoring and metrics to track AI system performance and health; analyze data for improvements
- Participate in incident response and troubleshooting for AI outages or performance issues; develop incident response plans
- Contribute to backend development to support AI tool integration and functionality
- Deploy and manage AI solutions on public cloud platforms (AWS/GCP), leveraging cloud-native services and best practices
- Present technical information effectively to non-technical audiences including senior leadership
Requirements
- Proven experience as a Site Reliability Engineer or similar role
- Strong understanding of AI technologies and platforms
- Experience deploying and managing applications in cloud environments (AWS/GCP)
- Solid backend development skills in Python, Java, or Go
- Strong proficiency managing and configuring public cloud services for scalability and reliability
- Experience with automation tools and scripting (Ansible, Terraform, Bash, Python)
- Excellent troubleshooting and problem-solving skills
- Strong communication and collaboration skills
- Strong security and compliance knowledge
- Experience working in highly regulated and high-growth environments
Benefits
- Medical, Dental, and Vision Plans with generous employee contributions
- Health Savings Account with company contributions
- Disability and Life Insurance
- 401(k) plan with company match
- Wellness stipend
- Mobile/Internet reimbursement
- Connections stipend
- Volunteer time off
- Fertility counseling and benefits
- Generous time off/leave policy
- Option to get paid in digital currency