Used Tools & Technologies
Not specified
Required Skills & Competences ?
Security @ 7 Linux @ 4 Distributed Systems @ 7 Android @ 4 Product Management @ 4 QA @ 4 CUDA @ 4 GPU @ 4Details
NVIDIA is looking for an Engineering Manager to lead IPP's (Infrastructure, Planning and Process) Cloud Platform Team focused on Rack Scale AI Systems. IPP is a global organization within NVIDIA working with various groups such as Graphics Processors, Mobile Processors, Deep Learning, Artificial Intelligence, and Driverless Cars to meet infrastructure needs. The cloud services support nearly half a million automated jobs daily across thousands of servers, aiding thousands of NVIDIA's software engineers globally. The cloud environment includes heterogeneous machines and devices across various operating systems (Windows, Linux, Android) and hardware platforms including NVIDIA GPUs and Tegra Processors.
Responsibilities
- Build and lead an engineering organization focused on Rack Scale systems onboarding and bring up execution, collaborating with external and internal partners.
- Coordinate with Engineering, Product Management, and Customer Program Management teams to define, prioritize, and implement features, infrastructure, processes, and workflows.
- Work with NVIDIA Product Teams to understand new product requirements, including HPC and AI/ML products.
- Collaborate with cross-functional teams including system engineering, software engineering, mechanical/thermal engineering, operations, data center teams, external vendors, and other partners to deliver a reliable and robust platform from concept to deployment.
- Identify potential or existing process weaknesses and suggest quality improvements.
- Drive deployment quality and improve time to market for next-gen products.
- Lead on-ground teams collecting deployment data and analyzing failure patterns.
- Oversee triage and recovery during product bring-up and sustain support through the product lifecycle.
Requirements
- Bachelor’s or Master’s Degree in Computer Science or Software Engineering, or equivalent experience.
- 5+ years of management experience in large, cross-matrix, geo-dispersed technology organizations, focusing on server and data center space; 8+ years overall experience.
- Strong technical knowledge of embedded systems, orchestration and automation systems, data centers, cloud architecture.
- Deep understanding of cloud design aspects such as virtualization, global infrastructure, distributed systems, load balancing, and security.
- Excellent risk identification and mitigation skills.
- Strong collaboration and interpersonal skills with proven ability to guide and influence diverse teams.
Ways to Stand Out
- Experience in large scale QA environments for product bring-ups.
- Experience in high performance or large scale computing environments, parallel computing, or CUDA.
- Expertise in large-scale and cluster computing (MPI), data center design including high-speed interconnects (InfiniBand), cluster storage, and scheduling.
- Experience with converged and hyper-converged hardware and servers.
- Strong background in Windows and Linux administration.
Benefits
- Eligible for equity and benefits.
NVIDIA is a leader in Artificial Intelligence, High-Performance Computing, and Visualization, pioneering GPU technologies that power innovative products and scientific advancements. The company values creativity and passion for new technologies and is an equal opportunity employer committed to workplace diversity.