Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Kubernetes @ 2
Python @ 6
Machine Learning @ 3
Data Science @ 3
Data Engineering @ 3
Mathematics @ 3
Parquet @ 3
AI @ 3
Data Pipelines @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
xAI’s mission is to create AI systems that can accurately understand the universe and aid humanity in its pursuit of knowledge. The team is small, highly motivated, and focused on engineering excellence. Employees are expected to be hands-on, show initiative, communicate clearly, and contribute directly to the company’s mission.
About the role
As a Data Engineer / AI Engineer on xAI’s Data team, you will develop systems, processes, and production code that power data acquisition, preparation, quality evaluation, and delivery for model training. You will partner with acquisition teams, ML engineers, and software engineers to identify data needs, build scalable data pipelines, and continuously improve data quality for model training. The role sits at the intersection of data, infrastructure, and machine learning.
Responsibilities
- Analyze the performance and impact of data used throughout the model training lifecycle
- Investigate anomalous model behavior and identify data issues causing poor downstream performance
- Design, build, and improve data cleaning, transformation, and quality-control steps for high-quality training data
- Research, evaluate, and develop frontier methods for improving data quality and effectiveness in AI model development
- Apply statistical techniques and empirical analysis to make data-driven decisions about dataset quality and model outcomes
- Partner across teams to identify data needs and define high-impact opportunities for new data acquisition and improvement
- Build and maintain production-grade data pipelines, tooling, and software systems that ingest, process, validate, and deliver data for training
- Develop metrics, evaluation frameworks, and monitoring systems to assess how data quality influences model behavior at scale
- Fuse data from multiple sources into reliable datasets for research and production model training
- Create shared datasets, tooling, and internal data products enabling other teams to analyze, debug, and improve model performance
Requirements
Basic qualifications
- Bachelor’s degree in computer science, data science, physics, mathematics, or a STEM discipline
- 1+ years of data/software engineering experience (internship experience is applicable)
- Experience implementing or analyzing language models or neural networks
Preferred skills and experience
- Professional experience in analytics, data science, machine learning, or data engineering
- Experience building and operating production data pipelines for neural network or large-scale machine learning workloads
- Strong experience with Python and the broader ecosystem of libraries and tools used in modern machine learning and data development
- Experience working with Parquet or similar columnar storage formats in large-scale data systems
- Familiarity with Kubernetes and distributed production environments
- Experience developing predictive models and machine learning pipelines (clustering, forecasting, anomaly detection, or related techniques)
- Experience working with very large-scale datasets (terabyte- to petabyte-scale)
- Strong statistical intuition and ability to use quantitative analysis to guide decisions
- Ability to operate effectively in a dynamic environment and take ownership of ambiguous problems
Compensation and benefits
$240,000 - $280,000 USD base salary. Total rewards package also includes equity, comprehensive medical/vision/dental coverage, access to a 401(k) retirement plan, short & long-term disability insurance, life insurance, and other perks.