Lead Principal Engineer, Enterprise Agentic AI Platform

at Nvidia

📍 Santa Clara, United States

USD 272,000-431,200 per year

SENIOR

✅ On-site

Used Tools & Technologies

Not specified

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Security @ 4 Go @ 4 Kubernetes @ 4 Python @ 4 CI/CD @ 4 Distributed Systems @ 8 Communication @ 7 Networking @ 4 API @ 4 Audit @ 4 GPU @ 4 Observability @ 7 AI @ 4 Agentic AI @ 4 RAG @ 4 LangChain @ 7

Details

Join NVIDIA IT’s Enterprise AI & Automation team to develop and expand enterprise-grade agentic AI systems at one of the world’s most advanced AI companies. NVIDIA’s Enterprise AI Platform drives production AI agents that securely link with enterprise systems to boost employee efficiency and accelerate business results across engineering, IT, supply chain, finance, HR, and sales. This role is a Principal/Distinguished-engineer-level architect who defines systems through direct construction and writes code daily in Python and/or Go. You will build enterprise agent architecture by delivering functional systems, developing reference implementations, and elevating technical standards across the organization.

Responsibilities

Develop and deliver production-quality agentic AI systems end-to-end using Python and/or Go, including Kubernetes deployment, agent runtimes, memory systems, orchestration, tool integration, and evaluation pipelines.
Define and advance NVIDIA’s Enterprise Agentic AI architecture through practical implementations, reference systems, and production deployments.
Build and implement multi-agent orchestration patterns (planner, executor, reviewer, tool agents) using frameworks such as LangChain, LangGraph, or similar orchestration systems, with strong regression coverage and observability.
Run fast, high-quality POCs on emerging agent architectures and harden successful patterns into reusable platform services, APIs, SDKs, and developer templates.
Architect and implement data flywheels that continuously improve agent quality through telemetry, benchmarking, automated evaluation, and structured feedback loops.
Embed security, guardrails, sandbox isolation, auditability, and policy enforcement directly into agent runtimes in partnership with security and governance teams.
Evaluate, integrate, and extend open-source and third-party agent platforms; make disciplined build-vs-use decisions based on performance, scalability, control, and long-term ownership.
Collaborate closely with engineering, infrastructure, product, and business stakeholders to align architectural direction with enterprise priorities and accelerate adoption.

Requirements

Bachelor’s degree in Computer Science or related field or equivalent experience; Master’s or PhD preferred.
15+ years of experience building and shipping large-scale distributed systems with significant hands-on coding in Python, Go, or similar systems languages.
Proven ability to rapidly transition from idea to functional prototype and then to robust, scalable platform solutions.
Proven experience constructing agentic AI systems, including RAG pipelines, long-lasting memory models, multi-agent management (e.g., LangChain, LangGraph), tool frameworks, and evaluation infrastructure.
Expert-level depth in Kubernetes, containerized workloads, networking, APIs, and secure enterprise integration patterns.
Experience crafting benchmarking, regression testing, telemetry, and observability systems that measure agent quality, latency, cost, reliability, and safety.
Comprehensive knowledge of performance tuning in hybrid environments, including GPU-based inference systems and inference efficiency on NVIDIA hardware.
Strong collaboration and communication skills with the ability to influence cross-functional collaborators and explain complex architectural concepts to technical and business audiences.

Ways to Stand Out

Experience delivering reusable developer-acceleration components such as SDKs, APIs, templates, reference implementations, and CI/CD automation.
Experience integrating enterprise vector databases and retrieval systems, and working with enterprise agent ecosystems such as Glean, Microsoft Copilot Studio, Google Agentspace, or similar.
Experience embedding fine-grained policy enforcement, access controls, sandbox isolation, and audit trails directly into AI runtimes.
GPU-acceleration expertise: optimizing model inference, batching strategies, memory utilization, and efficiency on NVIDIA hardware.
Evidence of meaningful open-source contributions, core commits, maintainership, or public technical artifacts demonstrating system-level depth.

Compensation & Benefits

Base salary range: 272,000 USD - 431,250 USD (determined based on location, experience, and pay of employees in similar positions).
Eligible for equity and benefits (link to NVIDIA benefits referenced in original posting).

Additional notes:

Applications accepted at least until March 2, 2026.
NVIDIA uses AI tools in its recruiting processes. NVIDIA is an equal opportunity employer committed to diversity and inclusion.