Member of Technical Staff (Software Engineer, Data Platform)
USD 220,000-405,000 per year
Used Tools & Technologies
Machine LearningRequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 β basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 β daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 β you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 β exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Go @ 5
Kafka @ 3
TypeScript @ 5
Python @ 5
Spark @ 3
dbt @ 3
Airflow @ 3
Flink @ 3
Kinesis @ 3
Data Science @ 3
Dagster @ 3
Streaming Data Processing @ 3
API @ 3
Experimentation @ 3
Databricks @ 3
Snowflake @ 3
Observability @ 3
AI @ 3
Data Modeling @ 3
ClickHouse @ 3
Data Pipelines @ 3
- 1-2 β basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 β daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 β you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 β exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
The Data Platform team at Perplexity owns the end-to-end data lifecycle, from ingestion through processing, storage, and serving. The platform powers product features, analytics, experimentation, AI workloads, and the company data lake. The team combines managed platforms (Databricks, Snowflake) with open-source technologies (Spark, Kafka, Flink, Airflow, Dagster, dbt, Iceberg, Delta Lake, ClickHouse) and defines architecture for batch and streaming systems, orchestration, observability, and a self-serve data platform.
Responsibilities
- Design and operate large-scale batch and streaming data pipelines that power product features, AI training/evaluation workflows, analytics, and experimentation.
- Build event-driven and streaming systems (Kafka, Kinesis, PubSub or similar) for real-time ingestion, transformation, and delivery, and batch frameworks for backfills, aggregations, and offline computation.
- Lead architecture of data orchestration using tools like Airflow or Dagster; own scheduling, dependency management, retries, SLAs, and end-to-end observability for critical data flows.
- Set and enforce guarantees for data correctness, freshness, lineage, and recoverability; design systems that handle scale growth, partial failures, and evolving schemas without disrupting AI workloads or product experiences.
- Build self-serve data platforms enabling engineers, data scientists, and analysts to discover data, define contracts, and create/operate pipelines with minimal friction.
- Improve developer experience via abstractions, paved paths, standards for data modeling, testing, validation, and deployment; treat the data platform as a product.
- Drive architectural decisions across storage, compute, orchestration, and data APIs in partnership with product engineering and data science.
- Mentor engineers, review designs, and raise the technical bar for data infrastructure through feedback, documentation, and hands-on collaboration.
Requirements
- 5+ years (Senior) or 8+ years (Staff) of software engineering experience.
- Strong experience building production data infrastructure systems.
- Hands-on experience with batch and/or streaming data processing at scale.
- Deep familiarity with data orchestration systems (Airflow, Dagster, or similar).
- Proficiency in Python and at least one additional backend language (Go, TypeScript, etc.).
- Strong systems thinking around reliability, latency, cost, and complexity tradeoffs.
- Experience supporting ML/AI workflows, training pipelines, or evaluation systems.
- Familiarity with data quality, lineage, observability, and governance tooling.
- Prior ownership of internal platforms used by many teams.
Technologies and tools (mentioned)
Databricks, Snowflake, Spark, Kafka, Flink, Airflow, Dagster, dbt, Iceberg, Delta Lake, ClickHouse, Kinesis, PubSub, Python, Go, TypeScript.
Compensation & Benefits
- Compensation range (U.S.): $220K β $405K; offers equity.
- U.S. benefits: equity, health, dental, vision, retirement, fitness, commuter and dependent care accounts, and more.
- International benefits: a comprehensive benefits program tailored to the region of residence. USD salary ranges apply only to U.S.-based positions; international salaries are set based on the local market.