Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Communication @ 6
macOS @ 3
AI @ 3
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
xAI is a small, mission-driven engineering organization focused on building AI systems that accurately understand the universe and aid humanity. The team values hands-on contributors, strong communication, and engineering excellence.
As an AI Tutor specialized in multilingual audio capabilities, you will help train and refine Grok for voice interactions, speech recognition, and auditory experiences across languages and accents. Your work centers on curating, annotating, and recording high-quality audio data to improve multilingual speech processing and natural spoken interactions.
Responsibilities
- Use proprietary software to provide labels, annotations, recordings, and inputs on projects involving multilingual audio clips, voice recordings, speech samples, and auditory elements.
- Support delivery of high-quality curated audio data that ensures clear, natural spoken output and accurate representation of linguistic and prosodic details (intonation, rhythm, accent) and professional audio standards.
- Collaborate with technical staff to develop tasks that improve the AI's handling of speech modulation, accent variation, noise in real-world recordings, and multilingual audio processing.
- Work with technical staff to improve annotation tools and audio workflows.
Requirements
- Native proficiency in Polish with exposure to diverse accents, dialects, or regional variations.
- Proficiency in English (minimum B2) with clear, natural vocal delivery appropriate for recordings.
- Strong auditory perception to identify nuances in speech, accents, pronunciation, intonation, and audio quality.
- Demonstrated ability to handle multilingual audio content, evaluate speech accuracy, and interpret cultural vocal expressions.
- Demonstrated ability to transcribe audio with high accuracy across accents and varying audio quality.
- Comfort providing high-quality voice recordings and feedback on audio samples in multiple languages.
- Strong comprehension, independent judgment on ambiguous audio, and attention to detail.
- Strong communication, interpersonal, analytical, and organizational skills.
- Personal device requirements: Chromebook, Mac with macOS 11.0 or later, or Windows 10 or later.
Preferred Skills and Experience
- Exceptional attention to linguistic nuance, auditory detail, and data quality beyond standard transcription.
- Deep understanding of what constitutes good/useful audio data.
- Advanced transcription and annotation practices, including handling disfluencies, accents, and prosodic features consistently and accurately.
- Background in linguistics (phonetics, phonology, sociolinguistics), speech sciences, cognitive science, or equivalent practical experience analyzing accent variation and multilingual speech patterns.
- Experience working with speech/audio datasets, annotation workflows, or AI training data; knowledge of how data quality impacts model performance and experience with training voice models.
- Professional voice work experience (voice acting, recording, podcasting) demonstrating clarity and recording quality.
- Portfolio (strongly preferred): voice samples, annotated transcripts, or audio-related work demonstrating quality and methodology.
Location and Other Expectations
- Roles may be full-time, part-time, or contractor-based. Contractor hours vary widely by project; most projects may require at least 10 hours per week on average, though this is not a fixed commitment.
- Positions may be performed remotely from any location worldwide, subject to legal eligibility and time-zone compatibility. (Note: for US-based candidates, xAI cannot hire in Wyoming or Illinois.)
- xAI is unable to provide visa sponsorship.
Compensation and Benefits
- US-based candidates: $35/hour - $45/hour, depending on experience, skills, education, location, and qualifications. International candidates: compensation details provided during recruitment.
- Benefits vary by employment type and location. For eligible U.S.-based positions, benefits may include health insurance, a 401(k) plan, and paid sick leave.