Senior Solutions Architect, HPC Systems Engineer

at Nvidia
USD 224,000-356,500 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 4 Kubernetes @ 4 Linux @ 4 DevOps @ 4 MLOps @ 4 Communication @ 7 Networking @ 4 Product Management @ 4 Debugging @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA is looking for an experienced GPU and network systems Solutions Architect & Engineer. You will be part of a team that brings new Artificial Intelligence (AI) hardware and software technologies to production in customer data centers. As part of the NVIDIA SA organization, you will drive deployment of end-to-end technology solutions integration at NVIDIA's most strategic technology customers, also providing recommendations on the product roadmap.

Responsibilities

  • Work with NVIDIA AI Native, Consumer Internet, and Enterprise customers on large data center GPU server and networking system deployments as Solution Architect Engineer.
  • Guide customer discussions on network design, compute/storage, and support bring-up of server/network/cluster deployments, including on-site visits during the bring-up phase.
  • Demonstrate subject matter expertise in advanced GPU & network systems, act as a trusted technical advisor to NVIDIA's strategic customers.
  • Provide customer-specific requirements to product teams to guide product roadmap features.
  • Identify new project opportunities for NVIDIA products and technology solutions in data center and AI applications.
  • Collaborate with GPU/Network Systems Engineering, Product Management, and Sales teams.
  • Conduct regular technical customer meetings for product roadmap updates, cluster issues debugging, feature discussions, and new technology introductions.
  • Build custom product demonstrations and proofs of concept addressing critical business needs of customers.
  • Analyze and debug compute/network configuration and performance issues to deliver performant clusters.
  • Use conferencing tools extensively; occasional (20%) travel required for on-site visits to customers and industry events. Remote work location is supported.

Requirements

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or related fields or equivalent experience.
  • 12+ years of Systems/Solution Engineering or similar experience.
  • System-level expertise in CPU/GPU server architecture, NICs, Linux, system software, and kernel drivers.
  • Experience with networking switches for Ethernet/Infiniband and data center infrastructure (power/cooling).
  • Knowledge of DevOps/MLOps technologies such as Docker/containers, Kubernetes.
  • Effective time management and multitasking capabilities.
  • Strong verbal/written communication skills; ability to share ideas/code clearly through documentation and presentations.

Ways to Stand Out

  • External customer-facing experience.
  • Experience with bring-up and deployment of large clusters.
  • Systems engineering, coding, and debugging skills including C/C++, Linux kernel, and drivers.
  • Hands-on experience with NVIDIA GPU systems/SDKs (e.g., CUDA), NVIDIA networking technologies (NICs, RoCE, InfiniBand), and/or ARM CPU solutions.
  • Familiarity with virtualization technology concepts.

Benefits

  • Base salary range: 224,000 USD - 356,500 USD, dependent on location, experience, and peer pay.
  • Eligibility for equity and benefits.
  • Commitment to fostering a diverse and equal opportunity work environment.