Senior Solutions Architect, HPC Systems Engineer

at Nvidia
πŸ“ United States
USD 184,000-287,500 per year
SENIOR
βœ… Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 4 Kubernetes @ 4 Linux @ 4 DevOps @ 4 MLOps @ 4 Communication @ 7 Networking @ 4 Debugging @ 4 CUDA @ 4 GPU @ 4

Details

NVIDIA is looking for an experienced GPU and network systems Solutions Architect & Engineer to join their team. The role involves deploying AI hardware and software technologies in customer data centers and providing strategic product guidance.

Responsibilities

  • Work with NVIDIA AI Native, Consumer Internet, and Enterprise customers on large data center GPU server and networking system deployments.
  • Guide customer discussions on network design, compute/storage, and support bring up of server/network/cluster deployments.
  • Visit customer data centers during the bring up phase.
  • Provide subject matter expertise in advanced GPU & network systems as a trusted technical advisor.
  • Bring customer-specific requirements to product teams to influence product roadmap.
  • Identify new project opportunities for NVIDIA products and technology in data center and AI applications.
  • Conduct regular technical meetings with customers on product roadmap, debugging cluster issues, feature discussions, and new technology introductions.
  • Build custom product demonstrations and proof of concepts (POCs) addressing critical business needs.
  • Analyze and debug compute/network configuration and performance issues for optimal cluster performance.
  • Occasional travel (about 20%) for on-site visits and industry events; open to remote work.

Requirements

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields or equivalent experience.
  • 8+ years of experience in Systems/Solution Engineering or similar roles.
  • System-level expertise in CPU/GPU server architecture, NICs, Linux, system software, and kernel drivers.
  • Experience with networking switches for Ethernet/Infiniband and data center infrastructure including power and cooling.
  • Knowledge of DevOps/MLOps technologies such as Docker, containers, and Kubernetes.
  • Effective time management and ability to balance multiple tasks.
  • Strong verbal and written communication skills.

Ways to Stand Out

  • External customer-facing background.
  • Experience with bring up and deployment of large clusters.
  • Systems engineering, coding, and debugging skills including C/C++, Linux kernel, and drivers.
  • Hands-on experience with NVIDIA GPU systems/SDKs (e.g., CUDA), NVIDIA Networking technologies (NICs, RoCE, InfiniBand), and/or ARM CPU solutions.
  • Familiarity with virtualization technology concepts.

Benefits

  • Base salary range: 184,000 USD - 287,500 USD, determined by location, experience, and peer pay.
  • Eligibility for equity and benefits.
  • Commitment to diversity and equal opportunity employment.