AI Software Engineer, LLM Inference Performance Analysis - New College Grad 2026

at Nvidia

📍 Santa Clara, United States

USD 124,000-218,500 per year

JUNIOR MIDDLE

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 6 Communication @ 6 LLM @ 3 PyTorch @ 2 CUDA @ 2 GPU @ 2

Details

NVIDIA is at the forefront of the generative AI revolution. We are looking for a Software Engineer, Performance Analysis, and Optimization for LLM Inference to join our performance engineering team. In this role, you will focus on improving the efficiency and scalability of large language model (LLM) inference on NVIDIA computing platforms through compiler- and kernel-level analysis and optimizations. You will work on components that span IR-based compiler optimization, graph-level transformations, and precompiled kernel performance tuning to deliver inference speed and efficiency.

Responsibilities

Analyze the performance of LLMs running on NVIDIA Compute Platforms using profiling, benchmarking, and performance analysis tools.
Identify opportunities in compiler optimization pipelines, including IR-based compiler middle-end optimizations and kernel-level transformations.
Design and develop new compiler passes and optimization techniques to deliver robust and maintainable compiler infrastructure and tools.
Collaborate with hardware architecture, compiler, and kernel teams to understand firmware and circuitry co-design that enables efficient LLM inference.
Work with globally distributed teams across compiler, kernel, hardware, and framework domains to investigate performance issues and contribute to solutions.

Requirements

Master's or PhD in Computer Science, Computer Engineering, or a related field, or equivalent experience.
Strong hands-on programming expertise in C++ and Python, with solid software engineering fundamentals.
Foundational understanding of modern deep learning models (including transformers and LLMs) and interest in inference performance and optimization.
Exposure to compiler concepts such as intermediate representations (IR), graph transformations, scheduling, or code generation through coursework, research, internships, or projects.
Familiarity with at least one deep learning framework or compiler/runtime ecosystem (examples listed: TensorRT-LLM, PyTorch, JAX/XLA, Triton, vLLM, or similar).
Ability to analyze performance bottlenecks and reason about optimization opportunities across model execution, kernels, and runtime systems.
Experience from class projects, internships, research, or open-source contributions involving performance-critical systems, compilers, or ML infrastructure.
Strong communication skills and the ability to collaborate effectively in a fast-paced, team-oriented environment.

Ways to Stand Out

Proficiency in CUDA programming and familiarity with GPU-accelerated deep learning frameworks and performance tuning techniques.
Showcase innovative applications of agentic AI tools that enhance productivity and workflow automation.
Active engagement with the open-source LLVM or MLIR community to ensure tighter integration and alignment with upstream efforts.

Compensation & Benefits

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary ranges provided in the posting are:

Level 2: 124,000 USD - 195,500 USD
Level 3: 152,000 USD - 218,500 USD

You will also be eligible for equity and benefits (link referenced in original posting).

Additional Information

Location stated in posting: US, CA, Santa Clara.
Applications for this job will be accepted at least until January 18, 2026.
This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes.
NVIDIA is an equal opportunity employer and is committed to fostering a diverse work environment.