Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Chef @ 4 Go @ 6 Kubernetes @ 4 Linux @ 7 Ruby @ 6 Distributed Systems @ 4 Bash @ 6 Communication @ 7 Networking @ 4 Debugging @ 4 Puppet @ 4 Salt @ 4 GPU @ 7Details
The Cloud Infrastructure team builds and operates the foundational cloud platforms that power Airbnb. This role focuses on the compute platform, cloud services, service mesh, cloud provisioning, and networking stack with emphasis on reliability, scalability, efficiency, and high availability. You will collaborate with open source communities and cloud providers and have the opportunity to influence industry tooling and practices.
Responsibilities
- Architect and operate scalable infrastructure on top of public cloud services, integrating new cloud provider features as they become available.
- Build and extend infrastructure for fleet health monitoring and automated remediation across thousands of Kubernetes nodes.
- Design and build Kubernetes controllers, operators, and custom tooling to expose cloud-native compute capabilities to Airbnb’s engineering teams.
- Collaborate with cloud provider teams and open source communities to adopt and optimize new infrastructure innovations.
- Work closely with kernel and OS primitives to enhance container security, resource isolation, and performance tuning in Airbnb’s compute fleet.
- Develop CLI and UI tools to improve infrastructure observability, usability, and operational ergonomics.
- Drive rolling upgrades, patch management, and runtime hardening to maintain a secure, stable, and modern compute platform.
- Optimize existing systems/services to improve performance and efficiency.
- Systematically improve availability by applying industry and distributed systems best practices.
- Work with other infrastructure engineers to build the foundation for Airbnb’s technical growth over the next decade.
Requirements
- 9+ years of experience designing and operating large-scale infrastructure or platform systems.
- Experience leading and shipping large scope technical projects in collaboration with multiple experienced engineers.
- Deep expertise with Kubernetes node components and ecosystem interfaces (kubelet, CRI, CNI, CSI, CDI).
- Strong Linux kernel and OS internals knowledge, focusing on container runtimes, resource isolation (cgroups, namespaces), GPU device drivers and security primitives.
- Experience managing large Kubernetes fleets (1000+ nodes) in public cloud environments, integrating cloud provider capabilities directly into platform infrastructure.
- Expertise building Kubernetes controllers/operators and exposing cloud-native infrastructure features internally.
- Skilled with infrastructure management frameworks (Puppet, Chef, Salt, or equivalent) to manage cloud fleets.
- Proficient in systems programming languages (Go) and scripting (Ruby, Bash).
- Proven track record in automation, health monitoring, and self-remediation of large compute fleets.
- Excellent problem-solving and production debugging skills spanning kernel, container runtime, and orchestration layers.
- Strong collaboration and communication skills; experience partnering with cross-functional teams and external vendors.
- Full-cycle developer mindset: ownership of building and operating high-scale, distributed systems across the full software life cycle.
Benefits
- Base pay range: $204,000 — $255,000 USD. Role may also be eligible for bonus, equity, benefits, and Employee Travel Credits.
- Remote-eligible for US-based candidates (must live in a state where Airbnb, Inc. has a registered entity). The role may include occasional work at an Airbnb office or attendance at offsites as agreed with your manager.
- Airbnb is committed to inclusion and belonging and provides disability accommodations for the application and interview process.