Senior Solutions Architect, HPC Systems Engineer

at Nvidia

📍 United States

USD 184,000-287,500 per year

SENIOR

✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Docker @ 4 Kubernetes @ 4 Linux @ 4 DevOps @ 4 MLOps @ 4 Communication @ 7 Networking @ 4 Debugging @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA is looking for an experienced GPU and network systems Solutions Architect & Engineer to join their team. The role involves deploying AI hardware and software technologies in customer data centers and providing strategic product guidance.

Responsibilities

Work with NVIDIA AI Native, Consumer Internet, and Enterprise customers on large data center GPU server and networking system deployments.
Guide customer discussions on network design, compute/storage, and support bring up of server/network/cluster deployments.
Visit customer data centers during the bring up phase.
Provide subject matter expertise in advanced GPU & network systems as a trusted technical advisor.
Bring customer-specific requirements to product teams to influence product roadmap.
Identify new project opportunities for NVIDIA products and technology in data center and AI applications.
Conduct regular technical meetings with customers on product roadmap, debugging cluster issues, feature discussions, and new technology introductions.
Build custom product demonstrations and proof of concepts (POCs) addressing critical business needs.
Analyze and debug compute/network configuration and performance issues for optimal cluster performance.
Occasional travel (about 20%) for on-site visits and industry events; open to remote work.

Requirements

BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields or equivalent experience.
8+ years of experience in Systems/Solution Engineering or similar roles.
System-level expertise in CPU/GPU server architecture, NICs, Linux, system software, and kernel drivers.
Experience with networking switches for Ethernet/Infiniband and data center infrastructure including power and cooling.
Knowledge of DevOps/MLOps technologies such as Docker, containers, and Kubernetes.
Effective time management and ability to balance multiple tasks.
Strong verbal and written communication skills.

Ways to Stand Out

External customer-facing background.
Experience with bring up and deployment of large clusters.
Systems engineering, coding, and debugging skills including C/C++, Linux kernel, and drivers.
Hands-on experience with NVIDIA GPU systems/SDKs (e.g., CUDA), NVIDIA Networking technologies (NICs, RoCE, InfiniBand), and/or ARM CPU solutions.
Familiarity with virtualization technology concepts.

Benefits

Base salary range: 184,000 USD - 287,500 USD, determined by location, experience, and peer pay.
Eligibility for equity and benefits.
Commitment to diversity and equal opportunity employment.