Used Tools & Technologies
Not specified
Required Skills & Competences ?
Distributed Systems @ 1 Machine Learning @ 3 Azure @ 3 Debugging @ 1 PyTorch @ 2 CUDA @ 2 GPU @ 3Details
Our Inference team brings OpenAI’s most capable research and technology to the world through our products. We empower consumers, enterprise and developers alike to use and access our state-of-the-art AI models. We focus on performant and efficient model inference and on accelerating research progression via model inference.
Responsibilities
- Work alongside machine learning researchers, engineers, and product managers to bring latest technologies into production.
- Enable advanced research through engineering collaboration with researchers.
- Introduce new techniques, tools, and architectures that improve performance, latency, throughput, and efficiency of the model inference stack.
- Build tools to give visibility into bottlenecks and sources of instability and design/implement solutions for the highest priority issues.
- Optimize code and a fleet of Azure VMs to utilize every FLOP and every GB of GPU RAM.
Requirements
- At least 5 years of professional software engineering experience.
- Understanding of modern ML architectures and intuition for optimizing their performance, particularly for inference.
- Familiarity or ability to quickly gain familiarity with PyTorch, NVIDIA GPUs and the software stacks that optimize them (for example NCCL, CUDA).
- Knowledge of HPC technologies such as InfiniBand, MPI, NVLink, etc.
- Experience architecting, building, observing, and debugging production distributed systems (bonus for performance-critical distributed systems).
- Experience rebuilding or substantially refactoring production systems due to rapidly increasing scale.
- Ability to own problems end-to-end and pick up missing knowledge as needed.
- Self-directed, humble, collaborative, and committed to team success.
Benefits
- Base pay range listed below; total compensation may include equity and performance-related bonuses.
- Medical, dental, and vision insurance with employer contributions to Health Savings Accounts.
- Pre-tax accounts (Health FSA, Dependent Care FSA, commuter benefits).
- 401(k) with employer match.
- Paid parental, medical, and caregiver leave; flexible PTO/paid time off and company holidays.
- Mental health and wellness support; employer-paid basic life and disability coverage.
- Annual learning and development stipend; daily meals in offices and meal delivery credits as eligible.
- Relocation support for eligible employees.
Additional notes
- Background checks will be administered in accordance with applicable law.
- OpenAI is an equal opportunity employer and provides reasonable accommodations for applicants with disabilities.