Base Command Manager Engineer - NVIS NPI

at Nvidia

📍 Santa Clara, United States

USD 176,000-327,800 per year

MIDDLE

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Ansible @ 7 Grafana @ 3 Jenkins @ 2 Kubernetes @ 3 Linux @ 7 Prometheus @ 3 Python @ 5 CI/CD @ 2 Leadership @ 3 Bash @ 5 Communication @ 3 Git @ 2 Networking @ 5 Salt @ 7 GPU @ 3

Details

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today we are tapping into the unlimited potential of AI to define the next era of computing. As an NVIDIAN, you'll be immersed in a diverse, supportive environment where everyone is inspired to do their best work.

We are seeking a dedicated Base Command Manager (BCM) Engineer to support product deployments/escalations and collaborate with Engineering and our Field Organization. Applications for this job will be accepted at least until August 26, 2025.

Responsibilities

Act as the link between engineering and the NVIS field team for cluster deployment and management solutions as part of NVIDIA's NPI team.
Collaborate closely with engineering and product teams to review and influence design decisions for products centered around large-scale, BCM-managed clusters.
Evaluate changes in BCM and underlying OS/software stacks and communicate impacts to the field organization to maintain robust and scalable deployment workflows.
Define and relay detailed cluster management requirements to engineering to enable successful New Product Introduction (NPI) of next-generation GPU platforms.
Describe architectural and design changes and build clear, actionable tasks for the field, including standardized deployment guides, configuration standard methodologies, and validation workflows.
Validate complex cluster configurations including Slurm and Kubernetes orchestrators for performance, scalability, and resilience, ensuring they meet real-world customer scenarios.
Bridge knowledge gaps, track progress, and align collaborators throughout the product development lifecycle to support the NPI team.
Support NVIDIA's mission by ensuring breakthrough technologies are successfully deployed for global customers and OEM partners.

Requirements

Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience).
10+ years of experience in at least two of the following: HPC / large-scale cluster administration, Linux systems engineering, infrastructure automation (e.g., Ansible, Salt), or data center operations.
5+ years of direct, hands-on experience provisioning, managing, and troubleshooting clusters using NVIDIA Base Command Manager (BCM).
Deep, practical knowledge of how Slurm and Kubernetes are coordinated, deployed, and managed by BCM, including workload submission and resource management.
Proficiency in Python and Bash scripting for automation, cluster validation, and workflow optimization.
In-depth experience with cluster management and monitoring tools (for example Prometheus, Grafana, DCGM, and similar observability stacks).
Outstanding written and verbal communication skills, with the ability to explain complex technical concepts to both technical and non-technical collaborators.
A customer-first attitude, self-motivation, and a proactive approach to leadership in diverse environments.

Ways to stand out

Proficiency with cluster networking including InfiniBand and Spectrum-X.
Experience with NVIDIA Mission Control.
Familiarity with CI/CD workflows in an infrastructure context, including tools such as Git, GitLab, and Jenkins.
Background in Professional Services, customer-facing deployment, and solutions optimization.
Industry certifications such as CKA/CKAD, RHCE, or other advanced Linux/HPC credentials.

Compensation & Benefits

Base salary range (Level 5): 176,000 USD - 276,000 USD.
Base salary range (Level 6): 208,000 USD - 327,750 USD.
You will also be eligible for equity and benefits.

NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.