Vacancy is archived. Applications are no longer accepted.

Solutions Architect - Cloud Infrastructure

at Nvidia
MIDDLE SENIOR
✅ Remote

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Grafana @ 3 Kubernetes @ 3 Linux @ 5 Prometheus @ 3 Distributed Systems @ 3 Machine Learning @ 3 AWS @ 5 Azure @ 5 Mathematics @ 3 OpenTelemetry @ 3

Details

We are excited to announce an opening for a Cloud Solution Architect at NVIDIA and are seeking a passionate individual with a strong interest in cloud infrastructure engineering! If you are enthusiastic about contributing to projects that push the boundaries of cloud-based AI and resilience in large-scale environments, we invite you to read on. NVIDIA is renowned as one of the most sought-after employers in the technology world, offering highly competitive benefits. We are home to some of the most innovative and forward-thinking individuals globally. If you are creative, autonomous, and eager to apply your skills and knowledge in a dynamic environment, we want to hear from you!

Responsibilities

  • Working as a key member of our cloud solutions team, you will be the go-to technical expert on NVIDIA's GPU-accelerated cloud offerings, helping clients build resilient and telemetry-driven cloud infrastructures.
  • Collaborating directly with engineering teams to secure design wins, address challenges, and deploy solutions into production, with a focus on developing robust tooling for observability and failure recovery.
  • Acting as a trusted advisor to our clients, understanding their cloud environment, translating requirements into technical solutions, and providing guidance on optimizing NVIDIA DGX Cloud for scalable, reliable, and high-performance workloads.

Requirements

  • 2+ years of experience in cloud infrastructure engineering, AI/ML systems, or large-scale distributed systems.
  • A BS in Computer Science, Electrical Engineering, Mathematics, or Physics, or equivalent experience.
  • A proven understanding of cloud computing and large-scale computing systems.
  • Proficiency in Linux, Windows Subsystem for Linux, and Windows.
  • A passion for machine learning and AI, and the drive to continually learn and apply new technologies.
  • Excellent interpersonal skills, including the ability to explain complex technical topics to non-experts.

Ways to stand out from the crowd:

  • Expertise with orchestration tools like Slurm and Kubernetes.
  • Familiarity with NVIDIA’s DGX Cloud, Base Command Platform, and its ecosystem.
  • Hands-on experience designing telemetry systems and failure recovery mechanisms for large-scale cloud infrastructures including observability tools such as Grafana, Prometheus, and OpenTelemetry.
  • Proficiency in deploying and managing cloud-native solutions using platforms such as AWS, Azure, or Google Cloud, with a focus on GPU-accelerated workloads.
  • Contributions to open-source projects showcasing expertise in cloud-AI/infrastructure engineering.