Senior Systems Software Manager - TAO Build, Automation and Release
at Nvidia
📍 Santa Clara, United States
$272,000-419,800 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 4 Ansible @ 4 Docker @ 4 Jenkins @ 4 Kubernetes @ 4 DevOps @ 6 Terraform @ 4 Python @ 7 GCP @ 7 CI/CD @ 4 Algorithms @ 4 Hiring @ 4 Leadership @ 6 AWS @ 7 Azure @ 7 QA @ 4 Puppet @ 4 Agile @ 4Details
NVIDIA is hiring a Senior Systems Software Manager for Build, Automation, Release, Optimizations to join the TAO Toolkit Deep Learning Architectures team. The toolkit encompasses scalable and easy-to-use modules for training, fine-tuning, and optimization for Computer Vision and Multi-Modal AI, to help advance the state of the art while improving performance. If you have a passion for pioneering technologies and a commitment to developing scalable, optimized, and ethical AI, we invite you to join our strong team at NVIDIA.
Responsibilities
- Lead a team of developers to improve CI/CD tools integration/operations, and full automation of CI/testing
- Lead efforts to resolve production issues and implement necessary integrations.
- Lead the ongoing design, implementation, and preservation of systems and tools across the toolkit stack.
- Design, implement, and manage cloud infrastructure for continuous integration, delivery, and deployment.
- Partner with a multi-functional team including engineering, product, QA to improve development workflows, reduce bottlenecks, handle and minimize risks, and enhance software delivery speed and quality.
- Lead the development of robust processes to write and maintain documentation infrastructure.
- Communicate effectively with technical and non-technical partners to set shared expectations and ensure visibility around the release and deployment process.
- Collaborate with diverse software, research, and hardware teams across geographies to analyze the interplay of hardware and software architectures to solve critical problems and future applications.
Requirements
- Bachelor’s/Master’s degree or equivalent experience in Computer Science, Information Systems, Engineering, or other related fields.
- 8+ overall years of proven experience in software engineering, DevOps, or release management, with at least 3 years of leadership experience or managerial role.
- Proven experience with automation and orchestration tools including Jenkins, Bazel, Gitlab, Docker, Kubernetes.
- Strong expertise in cloud platforms like AWS, Azure, GCP, or others.
- Proven experience in developing production-quality software pipelines for AI, computer vision or multi-modal algorithms, especially with LLMs and Multi-Modal Foundation models.
- Expertise in release management, version control systems and configuration management.
- Strong programming skills in Python and/or C++, and Experience developing integrated AI solutions.
- Proven track record to lead projects, manage timelines, and deliver results in an Agile/Scrum environment.
- Strong analytical and problem-solving skills with a focus on practical and scalable AI solutions.
- Strong interpersonal skills and ability to work in a collaborative environment.
Ways to stand out from the crowd:
- Knowledge of tools like Ansible, Terraform, and Puppet for automating repetitive tasks and infrastructure provisioning.
- Proven experience in automating the building and deploying of software around AI infrastructure.
- Experience with security practices and trustworthy AI.
- Background with NVIDIA SDKs such as TensorRT, RAPIDS, CUDA, and CUDNN.
NVIDIA is widely considered to be one of the technology industry's most desirable employers. We have some of the most forward-thinking and hard-working people working with us and our engineering teams. If you're a creative engineer with a real passion for building scalable and robust infrastructure, we want to hear from you.