Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 4
Ansible @ 4
Go @ 7
Grafana @ 4
Kubernetes @ 4
Prometheus @ 4
IaC @ 6
Terraform @ 4
Python @ 7
GCP @ 4
Java @ 7
GitHub @ 4
GitHub Actions @ 4
CI/CD @ 4
Distributed Systems @ 4
AWS @ 4
SRE @ 7
Thanos @ 4
Compliance @ 4
OpenTelemetry @ 4
Observability @ 4
AI @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
At SentinelOne, we are driven by a clear purpose: to give the advantage to those who secure our future. As AI reshapes how organizations build, operate, and innovate, the responsibility to protect them becomes more critical than ever. When you join SentinelOne, your work helps protect global enterprises, critical infrastructure, and the technologies shaping tomorrow.
We are seeking a Staff Infrastructure Engineer to be a pivotal technical leader and architect within our Observability team. You will design, implement, and optimize observability solutions that underpin SentinelOne's global platform, enabling engineering teams across the organization to gain real-time visibility and actionable insights.
Due to Federal Government contract requirements, U.S. Citizenship is required for this position. FedRAMP staff may be subject to customer or third party background checks up to and including Secret Clearance if required by their role at SentinelOne.
Responsibilities
- Architect and implement robust, scalable telemetry and observability platforms that enable rapid, safe delivery and monitoring of features.
- Serve as the primary Subject Matter Expert (SME) and administrator for the core observability stack, including Grafana, Prometheus, Thanos/Mimir/Cortex, and OpenTelemetry (OTEL) pipelines.
- Partner with engineering teams across the organization to define platform requirements and evolve the observability ecosystem ahead of stakeholder needs.
- Take end-to-end ownership of critical features from architecture and requirements through production deployment and operational maturity.
- Drive operational efficiency for observability services across AWS and GCP with attention to reliability and cloud cost-optimization.
- Build automation and self-service tooling to reduce operational toil and minimize pager fatigue.
- Deploy, maintain, and ensure compliance of observability systems in high-security environments, including FedRAMP and air-gapped deployments.
- Implement and standardize Infrastructure as Code (Terraform/Ansible) and industry best practices to increase platform transparency and reliability.
- Mentor engineers, lead technical design and code reviews, and provide guidance that elevates engineering quality.
- Lead resolution of complex production incidents, perform root-cause analyses, and participate in on-call rotations.
Requirements
- 8+ years experience in Infrastructure Engineering, Site Reliability Engineering (SRE), or a related systems-focused field.
- 8+ years experience architecting, scaling, and managing enterprise-grade observability stacks using Prometheus, Grafana, Thanos (or Mimir/Cortex), and OpenTelemetry.
- Experience designing cloud-native infrastructure in major cloud providers (AWS or GCP) and managing production Kubernetes environments (EKS, GKE).
- Advanced proficiency with IaC and automation tools, specifically Terraform and Ansible.
- Experience maintaining and optimizing high-throughput, large-scale distributed systems with focus on cost-efficiency, scalability, and disaster recovery.
- Demonstrated ability to lead complex technical designs, mentor engineers, and collaborate cross-functionally.
- US Citizenship and ability to work in a government-regulated environment.
Preferred Qualifications
- 8+ years production-level programming experience in Go (highly desirable) or another mainstream language such as Python or Java, with willingness to adopt Go.
- Experience with FedRAMP or other sovereign cloud / high-security compliance frameworks.
- Familiarity with operational challenges of on-premises, hybrid, or air-gapped Kubernetes deployments.
- Experience designing advanced CI/CD pipelines (e.g., GitHub Actions) and deployment strategies such as canary, blue-green, and rolling updates.
Compensation
- Base salary range (U.S. role): $132,000 — $215,000 USD. The range may vary based on candidate location; different pay ranges for some locations may be provided during the recruiting process.
Benefits
- Restricted Stock Units (RSUs) and Employee Stock Purchase Plan (ESPP)
- Flexible time off, paid company holidays and sick time, gender-neutral parental leave, grandparent leave
- Medical, dental, vision, 401(k) with company match, life and disability insurance, FSAs
- Home office allowance, mobile phone reimbursement
- Wellness programs, fertility coverage, adoption & surrogacy reimbursement
SentinelOne participates in the E-Verify Program for all U.S. based roles and is an Equal Employment Opportunity and Affirmative Action employer.