Senior AI Infrastructure Engineer - DGX Cloud

at Nvidia
USD 148,000-287,500 per year
SENIOR
βœ… On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Docker @ 4 Go @ 4 Kubernetes @ 4 Linux @ 4 Terraform @ 4 TypeScript @ 4 Python @ 4 Looker @ 4 Spark @ 4 Tableau @ 4 Java @ 4 Distributed Systems @ 4 Data Science @ 4 Hiring @ 4 Leadership @ 4 Apache Beam @ 4 Helm @ 4 Mathematics @ 4 Networking @ 4 Backstage @ 4 Reporting @ 7 Hive @ 4 PyTorch @ 4 GPU @ 4

Details

AI Infrastructure Engineers at NVIDIA ensure that our internal and external facing GPU cloud services run with maximum reliability and uptime as promised to the users. They enable developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency, and performance.

We are looking for engineers with a strong background in computer science fundamentals who are interested in building tooling, reporting, automation, and AI to support the operational flywheel across a highly dynamic organization.

Responsibilities

  • Design, build, deploy, and run internal tooling built on top of cloud infrastructure.
  • Design, implement, ship, and maintain essential data pipelines, data lake, and reporting used by executive leadership to decide on business priorities.
  • Integrate tooling with internal and customer workflows along with cloud service providers to streamline incident, change, and problem management processes.
  • Reduce the toil of running incidents and maintenance through software automation and AI/ML solutions.

Requirements

  • BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience.
  • 5+ years of experience.
  • A kind team player adaptable in a highly dynamic and changing environment.
  • Proven track record balancing initiating own projects, collaborating on others' projects, and convincing others to collaborate.
  • Experience with infrastructure automation and distributed systems design developing tools for running large scale private or public cloud systems in production.
  • Experience in one or more of the following programming languages: Python, Go, Typescript, C/C++, Java.
  • In-depth knowledge in one or more of Linux, Networking, Storage, and Containers.

Ways to Stand Out from the Crowd

  • Experience building and integrating with incident tooling such as FireHydrant, Rootly, incident.io, blameless.
  • Experience building plugins, templates, and entity schemas in Backstage.
  • Background with infrastructure technologies such as Kubernetes, terraform, docker, helm charts, and durable execution systems such as temporal.
  • Background with basic ML and data science concepts and tooling such as Hive, Apache Beam, Apache Spark, Pytorch, etc.
  • Experience with business analytics tooling such as Looker, Tableau, PowerBI.

Benefits

  • Eligibility for equity and other benefits.

NVIDIA is recognized as a technology industry leader with a focus on Artificial Intelligence, High-Performance Computing, and Visualization. The company is committed to fostering diversity and equal opportunity in hiring and promotion practices.