Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Go @ 6
Kafka @ 2
Kubernetes @ 3
Redis @ 2
Python @ 6
GCP @ 3
Java @ 6
Distributed Systems @ 3
AWS @ 3
Azure @ 3
gRPC @ 3
Debugging @ 6
LLM @ 3
OpenTelemetry @ 6
Observability @ 3
AI @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Glean builds a Work AI platform powering intelligent Search, an AI Assistant, and scalable AI agents. The Agents Runtime team builds the low-latency, reliable, and secure foundation that powers Glean's AI agents and assistant experiences at scale. The team designs and operates core runtime services for multi-turn orchestration, tool calling, model routing, memory, streaming, and safety, working across distributed systems, production observability, and ML infrastructure integrations.
Responsibilities
- Own runtime problems end-to-end: architecture, design, production launch, and ongoing reliability.
- Build and evolve core services for session lifecycle, streaming responses (gRPC/WebSockets), structured tool execution, memory/state, and policy/guardrails.
- Design for performance, correctness, and cost: reduce p50/p95 latency, improve tail behavior, and optimize token/tool budgets.
- Integrate with leading LLM providers (e.g., OpenAI, Anthropic, Google Gemini) and internal evaluation frameworks to improve quality and predictability.
- Harden the platform with fault isolation, retries, timeouts, circuit-breaking, backpressure, and graceful degradation.
- Instrument deep observability (tracing, metrics, logs) and create playbooks/SLOs for high availability and on-call excellence.
- Collaborate closely with product, quality, and application teams to prioritize the most impactful roadmap investments.
Requirements
- 3+ years of software engineering experience building production distributed systems or cloud-native applications.
- BS/BA in Computer Science or related field, or equivalent practical experience.
- Strong coding skills in at least one of: Python, Go, Java, or C++, with a focus on reliability, performance, and tests.
- Product-minded: prioritize customer impact, clear SLAs/SLOs, and pragmatic iteration.
- Ownership-driven with a positive, proactive attitude; comfortable leading projects or learning from battle-tested engineers.
- Experience operating services on Kubernetes and at least one major cloud (e.g., GCP, AWS, or Azure).
- Familiarity with event/streaming systems (e.g., Pub/Sub, Kafka), caching (e.g., Redis), and data stores for low-latency paths.
- Practical understanding of LLM/agents building blocks: tool/function calling, structured outputs, streaming, and model selection/routing.
- Strong observability and debugging skills: tracing (e.g., OpenTelemetry), metrics, dashboards, and production forensics.
- Background in one or more areas is a plus: policy/guardrails, multi-tenant isolation, rate-limiting, concurrency control, cost optimization.
Location
This role is hybrid (3-4 days a week in one of Glean's San Francisco Bay Area offices).
Compensation & Benefits
- Base salary range: $170,000 - $265,000 annually. Compensation offered will be determined by location, level, job-related knowledge, skills, and experience.
- Certain roles may be eligible for variable compensation, equity, and benefits.
- Benefits include Medical, Vision, and Dental coverage, generous time-off policy, 401(k) contribution opportunity, home office improvement stipend, annual education and wellness stipends, regular company events, and daily healthy lunches.
Other
- Glean is committed to diversity and inclusion and does not discriminate based on gender, ethnicity, sexual orientation, religion, civil or family status, age, disability, or race.
- #LI_HYBRID