Senior Software Engineer, Machine Learning Inference

at Nvidia

📍 Santa Clara, United States

USD 184,000-356,500 per year

SENIOR

✅ Hybrid

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 7 Python @ 4 Machine Learning @ 4 Communication @ 7 Performance Optimization @ 7 Rust @ 3 LLM @ 4 PyTorch @ 4 CUDA @ 4 GPU @ 6

Details

At NVIDIA, we're at the forefront of innovation, driving advancements in AI and machine learning to solve challenging problems. Join the TensorRT team to develop industry-leading deep learning inference software for NVIDIA AI accelerators and help power inference applications across datacenter, workstations, and PCs. This role focuses on designing and implementing inference software optimizations for GPUs and enabling efficient deployment of LLMs and generative AI models.

Responsibilities

Design, develop and optimize NVIDIA TensorRT and TensorRT-LLM to supercharge inference applications for datacenter, workstations, and PCs.
Develop software in C++, Python, and CUDA for seamless and efficient deployment of state-of-the-art LLMs and Generative AI models.
Collaborate with deep learning experts and GPU architects across the company to influence hardware and software design for inference.
Work on inference backends, compilers, and system-level software to achieve close-to-metal performance and high-efficiency inference.

Requirements

BS, MS, PhD or equivalent experience in Computer Science, Computer Engineering, or a related field.
8+ years of software development experience on a large codebase or project.
Strong proficiency in C++ (required). Familiarity with Rust or Python (one or both) is expected.
Experience developing deep learning frameworks, compilers, or system software.
Experience or strong interest in GPU programming (CUDA) and performance optimization.
Excellent problem-solving skills and the ability to work effectively in a fast-paced, collaborative environment.
Strong communication skills and the ability to articulate complex technical concepts.

Ways to stand out

Experience developing inference backends and compilers for GPUs.
Knowledge of machine learning techniques and GPU programming with CUDA or OpenCL.
Background working with LLM inference frameworks such as TensorRT-LLM, vLLM, SGLang.
Experience with deep learning frameworks like TensorRT, PyTorch, JAX.
Knowledge of close-to-metal performance analysis, optimization techniques, and tools.

Compensation & Benefits

Base salary range (Level 4): 184,000 USD - 287,500 USD
Base salary range (Level 5): 224,000 USD - 356,500 USD
You will also be eligible for equity and benefits. Base salary will be determined based on location, experience, and pay of employees in similar positions.

Location & Work Model

Location: Santa Clara, CA, United States
Office policy: Hybrid (#LI-Hybrid)

Application

Applications for this job will be accepted at least until August 4, 2025.

Equal Opportunity

NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.