Used Tools & Technologies
Not specified
Required Skills & Competences ?
Docker @ 4 Kafka @ 4 Kubernetes @ 4 Redis @ 4 Python @ 7 SQL @ 4 NoSQL @ 4 CI/CD @ 4 Distributed Systems @ 8 Leadership @ 4 Communication @ 4 JavaScript @ 7 MongoDB @ 4 LLM @ 4 GPU @ 4Details
NVIDIA's Applied AI team for chip design is building agentic, generative AI solutions to accelerate and transform GPU and chip engineering. You will collaborate with researchers and hardware designers to design, scale, and operate agentic systems that reason, plan, call tools, and generate code. The role focuses on building and maintaining the core infrastructure for deploying and running these agents in production, ensuring performance, scalability, reliability, and secure data management.
Responsibilities
- Design, develop, and maintain large-scale enterprise AI infrastructure that brings LLMs into AI applications to improve efficiency for NVIDIA software and hardware engineers.
- Collaborate with hardware chip designers and LLM research teams to understand GPU design needs and align LLM infrastructure accordingly.
- Optimize infrastructure for performance, scalability, and reliability, and ensure secure and efficient management of data.
- Stay current with industry advancements in AI and apply them to improve LLM infrastructure.
- Lead engineering efforts, maintain high-quality engineering practices, and inspire engineering teams.
Requirements
- Master or PhD degree in Computer Science, Electrical Engineering, or a relevant subject area (or equivalent experience).
- 10+ years of experience managing large-scale distributed systems or enterprise AI infrastructure.
- Expert-level proficiency in Python (required); advanced experience in JavaScript.
- Deep proficiency in software engineering principles, high-performance coding, and system optimization.
- Extensive experience architecting, scaling, and governing enterprise infrastructure, including CI/CD, Docker, Kubernetes, messaging systems (Kafka), data pipelines, and both SQL/NoSQL (especially MongoDB and Redis) for secure, reliable production deployments.
- Industry-leading expertise in AI/LLM infrastructure and agentic systems, including end-to-end design and integration of LLM/agent frameworks (LangChain, LangGraph, CrewAI, AutoGen), RAG, vector databases, and secure, compliant production deployments.
- Demonstrated leadership in defining technical direction, shaping system design, and launching platforms from idea to operation; experience forming and guiding international teams.
- Excellent communication, collaboration, and problem-solving skills; proven experience building large-scale, user-facing GenAI/LLM applications across organizational boundaries.
Compensation & Other Details
- Base salary range: 224,000 USD - 356,500 USD (final base salary determined by location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits.
- Employment type: Full time. #LI-Hybrid
- Applications accepted at least until December 13, 2025.
About NVIDIA
NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. They value diversity and do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.