Senior Solution Engineer, AI Factory Triage

at Nvidia
USD 136,000-264,500 per year
SENIOR
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 4 Kubernetes @ 4 Linux @ 7 Python @ 6 Communication @ 4 Parallel Programming @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA is seeking an engineer who enjoys direct customer interaction and contributing to software and products to join the Solution Engineering team supporting NVIDIA’s GPU-accelerated platforms in AI Factories. You will work directly with customers to triage hardware platform issues and AI/ML workloads on rack-scale, multi-GPU datacenter platforms (including GB200), solve customer problems, and contribute to product and tooling development. Strong Linux expertise, solid programming skills, and experience with multi-GPU platforms are required. Occasional weekend and holiday work to support customers is expected.

Responsibilities

  • Provide direct support to NVIDIA Enterprise customers: answer questions, reproduce, resolve, or advance customer issues.
  • Work with engineering teams on customer issues, providing logs, reproduction information, and other triage details.
  • Create and update product and support tools.
  • Take ownership and drive customer issues from inception to resolution.
  • Document customer interactions and enhance the knowledge base.
  • Develop features and tools as part of solution engineering efforts supporting NVIDIA technologies.
  • Occasional work on weekends and holidays to support customers.

Requirements

  • Minimum of a BS in Computer Engineering, Electrical Engineering, or equivalent experience.
  • At least 5+ years of engineering experience with multi-GPU platforms.
  • Strong system software expertise (firmware, BIOS, kernel, driver, operating system).
  • Solid understanding of Linux with the ability to analyze, optimize, and customize Linux environments for AI/ML workloads.
  • Experience with containerized solutions: Docker, Kubernetes, Slurm.
  • Proficient in C/C++ programming for platform OS, firmware, BIOS, kernel, drivers.
  • Proficient in Python with the ability to build custom tools.
  • Professional-level communication skills and ability to adjust communication to technical level of the audience and remain calm in difficult situations.
  • Excellent follow-up and organizational skills and strong problem-solving orientation.

Ways to stand out

  • Background with parallel programming or GPU acceleration (e.g., CUDA).
  • Experience developing in GPU-accelerated, cloud, or virtualized environments.
  • Experience analyzing software performance of distributed workloads.
  • Knowledge of clustering or HPC datacenter technologies and upper-layer protocols (NCCL, MPI).

Compensation & Other Information

  • Base salary range:
    • Level 3: 136,000 USD - 212,750 USD
    • Level 4: 168,000 USD - 264,500 USD
  • You will also be eligible for equity and benefits.
  • Applications for this job will be accepted at least until August 5, 2025.
  • NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.