Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 4 Go @ 4 Kubernetes @ 4 Python @ 4 Distributed Systems @ 7 Communication @ 4 GPU @ 4Details
NVIDIA is seeking a Senior Software Engineer to develop distributed storage services for AI/ML. The goal is to craft a reliable, scalable, and efficient storage-as-a-service tailored to AI applications that can be deployed anywhere and scale without limitations. This service supports critical NVIDIA business areas from graphics drivers to autonomous vehicles to deep learning frameworks. The role requires a deep understanding of distributed systems, outstanding design skills, and a track record in building and delivering large-scale distributed services.
Responsibilities
- Lead the overall architecture and design of a distributed storage service optimized for AI/ML.
- Build features to enhance availability and reliability for large-scale deployments.
- Engage and collaborate with NVIDIA Research, Computing, Product teams, cross-functional teams, and external customers to deliver Cloud services.
- Automate distributed storage service end-to-end, including deployment, management, and monitoring.
- Own product delivery from inception through support and operations.
Requirements
- Strong track record of delivering distributed services in a variety of distributed computing environments.
- Experience designing, implementing, and operating distributed systems at a multi-petabyte scale.
- Experience implementing storage services and interfaces to ensure scalable, high-performance, and reliable solutions.
- Prior experience developing distributed systems with Kubernetes, Golang (Go), Python, and Cloud Service Provider integrations.
- Demonstrated history of ownership of product delivery from inception to support.
- Excellent communication and presentation skills.
- Bachelor’s of Science in Computer Science or related field (or equivalent experience) with 5+ years of industry experience.
Ways to Stand Out
- Architected, built, and deployed distributed services running on large-scale clusters (multi-petabyte to exabyte) with millions of users.
- Experience owning all software development and delivery stages.
- Passion for accelerated computing environments and experience or interest in GPU Direct Storage, DPU, and RDMA.
- Experience building and delivering cloud services focused on distributed systems.
Compensation & Benefits
- Base salary ranges by level:
- Level 3: 148,000 USD - 235,750 USD
- Level 4: 184,000 USD - 287,500 USD
- Eligible for equity and company benefits.
Additional Details
- Location: Santa Clara, CA, United States
- Employment type: Full time
- Applications accepted at least until August 11, 2025
- NVIDIA is an equal opportunity employer committed to a diverse work environment.