Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 6 Machine Learning @ 3 Data Science @ 3Details
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. They aim to make AI safe and beneficial for users and society. The team includes researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
Responsibilities
- Design and implement novel data curation pipelines to identify, verify, and filter training data for accuracy relative to the model’s knowledge
- Develop specialized classifiers to detect potential hallucinations or miscalibrated claims made by the model
- Create and maintain comprehensive honesty benchmarks and evaluation frameworks
- Implement techniques to ground model outputs in verified information, including search and retrieval-augmented generation (RAG) systems
- Design and deploy human feedback collection systems for identifying and correcting miscalibrated responses
- Develop prompting pipelines to generate data that improves model accuracy and honesty
- Develop and test novel reinforcement learning environments rewarding truthful outputs and penalizing fabricated claims
- Create tools to help human evaluators efficiently assess model outputs for accuracy
Requirements
- MS or PhD in Computer Science, Machine Learning, or related field
- Strong programming skills in Python
- Industry experience with language model fine-tuning and classifier training
- Proficiency in experimental design and statistical analysis for measuring calibration and accuracy improvements
- Strong interest in AI safety, accuracy, and honesty of AI systems
- Experience in data science or dataset creation and curation for fine-tuning large language models
- Understanding of uncertainty, calibration, and truthfulness metrics in model outputs
Strong candidates may also have
- Published work on hallucination prevention, factual grounding, or knowledge integration in language models
- Experience with fact-grounding techniques
- Background in confidence estimation or calibration methods for ML models
- Experience creating and maintaining factual knowledge bases
- Familiarity with reinforcement learning from human feedback (RLHF) applied to truthfulness
- Experience with crowdsourcing platforms and human feedback collection
- Experience developing model accuracy or hallucination evaluation methods
Anthropic is committed to building advanced AI systems that behave reliably and ethically while aligned with human values.