Senior Solutions Architect, HPC Systems Engineer

at Nvidia
πŸ“ United States
USD 184,000-287,500 per year
SENIOR
βœ… Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 4 Kubernetes @ 4 Linux @ 4 DevOps @ 4 MLOps @ 4 Communication @ 7 Networking @ 4 Product Management @ 4 Debugging @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA is looking for an experienced GPU and network systems Solutions Architect & Engineer to drive deployment of end-to-end technology solutions integration at strategic customers' data centers. The role focuses on bringing AI hardware and software technologies to production, guiding network and compute design, debugging cluster issues, and working closely with product, engineering and sales teams. Occasional on-site visits (~20% travel) are required; the company is open to remote work locations.

Responsibilities

  • Work with NVIDIA AI Native, Consumer Internet and Enterprise customers on large data center GPU server and networking system deployments as a Solutions Architect / Engineer.
  • Guide customer discussions on network design, compute/storage and support bring up of server/network/cluster deployments; visit customer data centers during bring-up phases.
  • Demonstrate subject matter expertise in advanced GPU & network systems and act as a trusted technical advisor to strategic customers.
  • Bring customer-specific requirements to product teams to guide product roadmap features.
  • Identify new project opportunities for NVIDIA products and technology solutions in data center and AI applications.
  • Conduct regular technical customer meetings for product roadmap, cluster issue debugging, feature discussions and introductions to new technology solutions.
  • Build custom product demonstrations and POCs addressing critical customer business needs.
  • Analyze and debug compute/network configuration and performance issues to deliver performant clusters.
  • Collaborate closely with GPU/Network Systems Engineering, Product Management and Sales teams.

Requirements

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.
  • 8+ years of Systems / Solution Engineering or similar engineering experience is ideal.
  • System-level expertise of CPU/GPU server architecture, NICs, Linux, system software and kernel drivers.
  • Experience with networking switches for Ethernet / InfiniBand, and familiarity with data center infrastructure (power / cooling).
  • Knowledge of DevOps / MLOps technologies such as Docker / containers and Kubernetes.
  • Effective time management and the ability to balance multiple tasks.
  • Strong verbal and written communication skills; ability to share ideas and code clearly through documents and presentations.

Ways to stand out

  • External customer-facing background.
  • Experience with bring-up and deployment of large clusters.
  • Systems engineering, coding, and debugging skills, including experience with C/C++, Linux kernel and drivers.
  • Hands-on experience with NVIDIA GPU systems / SDKs (e.g., CUDA), NVIDIA networking technologies (e.g., NICs, RoCE, InfiniBand), and/or ARM CPU solutions.
  • Familiarity with virtualization technology concepts.

Benefits

  • Base salary range (location & experience dependent): 184,000 USD - 287,500 USD.
  • Eligible for equity and company benefits (see NVIDIA benefits).

Applications for this job will be accepted at least until July 29, 2025.

NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.