AI Platform Engineer

at Nvidia
USD 168,000-322,000 per year
MIDDLE
✅ On-site

Used Tools & Technologies

Machine Learning

Required Skills & Competences

Security @ 3 Go @ 6 Kubernetes @ 3 Python @ 6 Algorithms @ 3 Data Structures @ 3 Distributed Systems @ 3 MLOps @ 3 Hiring @ 3 Communication @ 3 SRE @ 7 Rust @ 6 Debugging @ 6 OWASP @ 3 LLM @ 3 Compliance @ 3 Agile @ 3 GPU @ 3 Observability @ 3 AI @ 3

Details

NVIDIA is hiring an AI Platform Engineer to build, support, and maintain the next generation of AI-powered enterprise products that improve engineering efficiency, data security, and power product development. This role collaborates with Cloud and AI/ML teams in a multifaceted and agile environment and focuses on scalable, reliable AI-native infrastructure and tooling.

Responsibilities

  • Define and lead AI-native infrastructure roadmaps and cross-organizational initiatives.
  • Architect and scale LLM/ML infrastructure across cloud-native clusters and on-premises hardware.
  • Design and implement observability for infrastructure health and AI model performance.
  • Build LLM-aware monitoring and leverage AI to improve incident response and reduce toil.
  • Develop automation and tooling to ensure reliability, scalability, and developer self-services.
  • Troubleshoot complex distributed systems, including deep Kubernetes and AI/ML scaling challenges.
  • Drive AI-assisted engineering practices and mentor engineers to foster an AI-first culture.
  • Partner with product engineering and internal business units to translate AI platform capabilities into reliable, scalable solutions that accelerate product development.

Requirements

  • 10+ years in cloud, platform, or SRE roles with relevant education or equivalent experience.
  • Bachelor's degree or equivalent experience.
  • Strong Python and at least one systems language (C++, Go, or Rust), with proven distributed systems debugging expertise.
  • Deep experience building and scaling distributed systems, including Kubernetes and bare-metal infrastructure.
  • Strong observability design across infrastructure and AI workloads (metrics, logging, tracing, AI quality signals).
  • Hands-on experience operating AI/ML platforms, including MLOps, model serving, and GPU-accelerated environments.
  • Experience with infrastructure and application security practices, such as identity/auth, network segmentation, supply chain security, and vulnerability management in cloud-native environments.
  • Practical use of AI-assisted development tools and coding agents in daily workflows.
  • Solid foundation in data structures, algorithms, and complexity analysis.
  • Excellent problem-solving, communication, and collaboration across multiple functions.

Ways to stand out

  • Deep experience with AI/ML platforms (e.g., Hugging Face, Weights & Biases, NVIDIA NIM).
  • Proven use of AI agents and LLM tooling to enhance observability, incident response, or developer productivity.
  • Experience with artifact management, AI supply chain security, or trusted model distribution systems.
  • Experience with AI-specific threat models (OWASP Top 10 for LLMs, model poisoning, adversarial inputs), compliance frameworks (FedRAMP, SOC 2), and red-teaming or security evaluation of LLM systems.
  • Strong ownership demeanor with a structured, automation-first approach.
  • Demonstrated impact driving AI-first engineering practices across teams.

Compensation & Benefits

  • Base salary ranges (determined by location, experience, and internal pay):
    • Level 4: 168,000 USD - 270,250 USD
    • Level 5: 200,000 USD - 322,000 USD
  • Eligible for equity and benefits.

Other information

  • Applications for this job will be accepted at least until March 23, 2026.
  • This posting is for an existing vacancy.
  • NVIDIA uses AI tools in its recruiting processes.
  • NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.