Senior Solutions Architect, HPC Systems Engineer
at Nvidia
π United States
USD 184,000-287,500 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Docker @ 4 Kubernetes @ 4 Linux @ 4 DevOps @ 4 MLOps @ 4 Communication @ 7 Networking @ 4 Product Management @ 4 Debugging @ 4 CUDA @ 4 GPU @ 4Details
NVIDIA is looking for an experienced GPU and network systems Solutions Architect & Engineer to drive deployment of end-to-end technology solutions integration at strategic customers' data centers. The role focuses on bringing AI hardware and software technologies to production, guiding network and compute design, debugging cluster issues, and working closely with product, engineering and sales teams. Occasional on-site visits (~20% travel) are required; the company is open to remote work locations.
Responsibilities
- Work with NVIDIA AI Native, Consumer Internet and Enterprise customers on large data center GPU server and networking system deployments as a Solutions Architect / Engineer.
- Guide customer discussions on network design, compute/storage and support bring up of server/network/cluster deployments; visit customer data centers during bring-up phases.
- Demonstrate subject matter expertise in advanced GPU & network systems and act as a trusted technical advisor to strategic customers.
- Bring customer-specific requirements to product teams to guide product roadmap features.
- Identify new project opportunities for NVIDIA products and technology solutions in data center and AI applications.
- Conduct regular technical customer meetings for product roadmap, cluster issue debugging, feature discussions and introductions to new technology solutions.
- Build custom product demonstrations and POCs addressing critical customer business needs.
- Analyze and debug compute/network configuration and performance issues to deliver performant clusters.
- Collaborate closely with GPU/Network Systems Engineering, Product Management and Sales teams.
Requirements
- BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.
- 8+ years of Systems / Solution Engineering or similar engineering experience is ideal.
- System-level expertise of CPU/GPU server architecture, NICs, Linux, system software and kernel drivers.
- Experience with networking switches for Ethernet / InfiniBand, and familiarity with data center infrastructure (power / cooling).
- Knowledge of DevOps / MLOps technologies such as Docker / containers and Kubernetes.
- Effective time management and the ability to balance multiple tasks.
- Strong verbal and written communication skills; ability to share ideas and code clearly through documents and presentations.
Ways to stand out
- External customer-facing background.
- Experience with bring-up and deployment of large clusters.
- Systems engineering, coding, and debugging skills, including experience with C/C++, Linux kernel and drivers.
- Hands-on experience with NVIDIA GPU systems / SDKs (e.g., CUDA), NVIDIA networking technologies (e.g., NICs, RoCE, InfiniBand), and/or ARM CPU solutions.
- Familiarity with virtualization technology concepts.
Benefits
- Base salary range (location & experience dependent): 184,000 USD - 287,500 USD.
- Eligible for equity and company benefits (see NVIDIA benefits).
Applications for this job will be accepted at least until July 29, 2025.
NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.