Infrastructure Capacity Engineer

USD 225,000-300,000 per year
MIDDLE
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Ansible @ 3 Go @ 5 Kubernetes @ 6 Terraform @ 3 Python @ 5 GCP @ 6 CI/CD @ 3 Algorithms @ 3 Distributed Systems @ 6 AWS @ 6 Azure @ 6 Performance Optimization @ 6 LLM @ 3 GPU @ 3

Details

Perplexity is an AI-powered answer engine founded in December 2022 and rapidly growing as one of the world’s leading AI platforms. The company has raised substantial venture investment and aims to build accurate, trustworthy AI to power decision-making and assistive AI.

Responsibilities

  • Design and implement comprehensive capacity planning models and forecasting systems that predict infrastructure needs across compute, storage, and network resources for AI/ML workloads.
  • Build and maintain automated capacity management systems that dynamically scale infrastructure based on real-time demand patterns and usage forecasts.
  • Lead cross-functional capacity planning initiatives including hardware procurement, data center expansion, and cloud resource optimization.
  • Develop monitoring and alerting systems that provide early warning indicators for capacity constraints and performance degradation.
  • Create and maintain detailed infrastructure capacity models that account for seasonal patterns, product launches, and scaling efficiency across different workload types.
  • Optimize resource utilization and cost efficiency through advanced placement algorithms, load balancing strategies, and infrastructure rightsizing.
  • Design and implement disaster recovery and business continuity plans to ensure service availability during infrastructure failures or capacity emergencies.
  • Collaborate with Site Reliability Engineering and Platform teams to establish capacity-aware deployment strategies and infrastructure automation.
  • Play a leading role in defining the capacity engineering discipline within Perplexity’s engineering organization.

Requirements

  • Minimum of 4+ years of experience in infrastructure capacity planning, systems engineering, or related technical roles at scale.
  • Proven experience managing infrastructure capacity for high-growth technology companies, preferably with AI/ML workloads or real-time systems.
  • Strong background in distributed systems architecture, cloud infrastructure (AWS/GCP/Azure), and container orchestration (Kubernetes).
  • Experience with capacity modeling tools, forecasting methodologies, and statistical analysis for infrastructure planning.
  • Proficiency in programming languages such as Python, Go, or similar for automation and tooling development.
  • Deep understanding of infrastructure monitoring, observability, and performance optimization techniques.
  • Experience with infrastructure-as-code tools (Terraform, Ansible) and CI/CD pipelines for infrastructure management.
  • Strong analytical and problem-solving skills with the ability to make data-driven decisions under uncertainty.
  • Excellent cross-functional collaboration skills and experience working with engineering, product, and business stakeholders.
  • Experience with large-scale database systems, caching layers, and content delivery networks preferred.
  • Background in AI/ML infrastructure, LLM inference, GPU cluster management, or high-performance computing is a plus.

Benefits

  • Cash compensation range: $225,000 - $300,000 (final offer determined by experience and expertise).
  • Equity may be part of the total compensation package.
  • Comprehensive health, dental, and vision insurance for you and your dependents.
  • 401(k) plan.