Lead Site Reliability Engineer

at Glean
USD 200,000-260,000 per year
SENIOR
✅ Hybrid

Used Tools & Technologies

Not specified

Required Skills & Competences

Security @ 4 Software Development @ 4 Docker @ 3 Kubernetes @ 3 Terraform @ 3 GitHub @ 4 Algorithms @ 4 Distributed Systems @ 6 Leadership @ 4 AWS @ 7 Azure @ 7 Networking @ 4 SRE @ 4 Performance Optimization @ 6 ServiceNow @ 4 API @ 4 Technical Leadership @ 4 System Architecture @ 6 LLM @ 4 Compliance @ 4 AI @ 4 Agentic AI @ 4

Details

About Glean:

Glean is the Work AI platform that helps everyone work smarter with AI. What began as the industry’s most advanced enterprise search has evolved into a full-scale Work AI ecosystem, powering intelligent Search, an AI Assistant, and scalable AI agents on one secure, open platform. With over 100 enterprise SaaS connectors, flexible LLM choice, and robust APIs, Glean gives organizations the infrastructure to govern, scale, and customize AI across their entire business - without vendor lock-in or costly implementation cycles.

At its core, Glean is redefining how enterprises find, use, and act on knowledge. Its Enterprise Graph and Personal Knowledge Graph map the relationships between people, content, and activity, delivering deeply personalized, context-aware responses for every employee. This foundation powers Glean’s agentic capabilities - AI agents that automate real work across teams by accessing the industry’s broadest range of data: enterprise and world, structured and unstructured, historical and real-time. The result: measurable business impact through faster onboarding, hours of productivity gained each week, and smarter, safer decisions at every level.

Recognized by Fast Company as one of the World’s Most Innovative Companies (Top 10, 2025), by CNBC’s Disruptor 50, Bloomberg’s AI Startups to Watch (2026), Forbes AI 50, and Gartner’s Tech Innovators in Agentic AI, Glean continues to accelerate its global impact. With customers across 50+ industries and 1,000+ employees in more than 25 countries, we’re helping the world’s largest organizations make every employee AI-fluent, and turning the superintelligent enterprise from concept into reality.

If you’re excited to shape how the world works, you’ll help build systems used daily across Microsoft Teams, Zoom, ServiceNow, Zendesk, GitHub, and many more - deeply embedded where people get things done. You’ll ship agentic capabilities on an open, extensible stack, with the craft and care required for enterprise trust, as we bring Work AI to every employee, in every company.

About the Role:

Glean is seeking a Site Reliability Engineering Lead to foster a culture of engineering excellence, drive technical strategy, and develop a high-performing, collaborative team. Your role is pivotal in ensuring our services meet stringent Service Level Objectives (SLOs) and in building resilient, automated production environments in the cloud. You'll lead a team and be responsible for products globally, providing technical leadership to key projects and empowering your team to do the same.

Much of our software development focuses on building infrastructure to scale our operations in a hybrid cloud environment and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale and fast growth which are unique to Glean, while using your expertise in coding, algorithms, problem-solving, and SRE practices. We keep Glean applications up and running, ensuring our customers have the best and most reliable experience possible.

Responsibilities

  • Provide technical leadership and mentorship: drive technical excellence, set best practices for incident management, performance optimization, and automation; influence cross-team collaboration and architectural decisions.
  • Ensure high availability: implement and maintain resilient cloud architectures, monitor system performance, and proactively resolve bottlenecks or failure points.
  • Incident management: participate in primary on-call rotation; cultivate a blameless postmortem culture and continuously optimize on-call processes for sustainability and efficiency.
  • Automation and tooling: develop and maintain automation scripts, tools, and processes to streamline deployments, monitoring, and management tasks.
  • Performance optimization: optimize cloud infrastructure and applications for performance, scalability, and cost-effectiveness.
  • Security and compliance: collaborate with security engineers to implement best practices and ensure compliance with standards and policies.
  • Monitoring and alerting: design and configure advanced monitoring systems, set up alerts, and create dashboards and playbooks for production on-call.
  • Software development consultation: engage in the software development lifecycle, participate in design and launch reviews, and provide SRE insights to influence system architecture.

Requirements

  • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
  • 8+ years of experience in a senior-level SRE or similar role, managing cloud-based services and infrastructure.
  • 5+ years of software development experience in one or more programming languages.
  • 3+ years managing people or teams, leading projects, and designing, analyzing, and troubleshooting distributed systems running in Cloud.
  • Strong knowledge of cloud platforms such as Google Cloud Platform, AWS, or Azure.
  • Practical experience with containerization (Docker, Kubernetes) and familiarity with infrastructure-as-code tools like Terraform (essential).
  • Solid understanding of networking, security principles, and SRE/security best practices.
  • Proficiency with monitoring and alerting tools to detect and respond to issues effectively.

Location

This role is hybrid (4 days a week in one of our Palo Alto office).

Compensation & Benefits

  • Standard base salary range: $200,000 - $260,000 annually. Compensation will be determined by location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for variable compensation, equity, and benefits.
  • Benefits include Medical, Vision, and Dental coverage, generous time-off policy, opportunity to contribute to 401(k), a home office improvement stipend, annual education and wellness stipends, regular company events, and daily healthy lunches.

We are committed to an inclusive and diverse company and do not discriminate based on gender, ethnicity, sexual orientation, religion, civil or family status, age, disability, or race.

#LI-HYBRID