Senior Solution Engineer, System Diagnostics Software
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Docker @ 4 Kubernetes @ 4 Linux @ 7 Python @ 6 Hiring @ 4 Communication @ 7 Parallel Programming @ 4 CUDA @ 4 GPU @ 4Details
NVIDIA is looking for an engineer who wants the excitement of direct customer interaction, and the reward of contributing to software and products, to join our team of Solution Engineers supporting NVIDIAβs GPU accelerated platforms in AI Factories. You will work directly with customers to provide solutions on the latest NVIDIA platforms (including the GB200). The role involves triaging customers' hardware platform issues and AI/ML workloads in large datacenters/rack-scale platforms, solving customer problems, and contributing to products and software tooling. Strong problem-solving abilities, clear communication, the ability to work on multiple projects, solid Linux expertise, programming skills, and experience with multi-GPU platforms are required. Expertise analyzing performance of distributed GPU-accelerated workloads is a plus.
Responsibilities
- Provide direct support to NVIDIA Enterprise customers: answer questions, reproduce, resolve, or advance customer issues.
- Work with engineering teams on customer issues, providing logs, reproduction information, and other triage artifacts.
- Create and update product and support tools.
- Take ownership and drive customer issues from inception to resolution.
- Document customer interactions and improve the knowledge base.
- Develop features and tools as part of solution engineering efforts to support NVIDIA technologies.
- Occasional weekend and holiday work to support customers.
Requirements
- Minimum of a BS in Computer Engineering, Electrical Engineering, or equivalent experience.
- At least 5+ years of engineering experience with multi-GPU platforms.
- Strong system software expertise (firmware, BIOS, kernel, driver, operating system).
- Solid understanding of Linux and the ability to analyze, optimize, and customize Linux environments for AI/ML workloads.
- Containerized solutions experience with Docker, Kubernetes, and Slurm.
- Professional-level communication skills; ability to adjust communication to the audience and remain calm in high-pressure situations.
- Excellent follow-up and organizational skills; passion for solving problems.
- Proficient in C/C++ programming for platform OS, firmware, BIOS, kernel, and drivers.
- Proficient in Python with the ability to build custom tools.
Ways to stand out (preferred/bonus)
- Background with parallel programming or GPU acceleration (e.g., CUDA).
- Experience developing in GPU-accelerated, cloud, or virtualized environments.
- Experience analyzing software performance of distributed workloads.
- Clustering or HPC datacenter technologies including upper layer protocols (NCCL, MPI).
Compensation & Benefits
- Base salary is determined based on location and experience.
- Base salary ranges provided: Level 3 β 136,000 USD to 212,750 USD; Level 4 β 168,000 USD to 264,500 USD.
- You will also be eligible for equity and benefits (see NVIDIA benefits page).
Additional information
- Applications for this job will be accepted at least until September 19, 2025.
- NVIDIA is an equal opportunity employer and values diversity in hiring and promotions.