Senior Software Engineer, AI Inference Systems

at Nvidia

📍 Germany

PLN 292,500-650,000 per year

SENIOR

✅ Remote ✅ Hybrid

Used Tools & Technologies

IaC Machine Learning HPC

Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value. About proficiency levels:

1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;

3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;

7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;

10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.

Docker @ 3 Go @ 1 Kubernetes @ 3 Linux @ 3 Python @ 1 GCP @ 4 GitHub @ 4 CI/CD @ 4 Algorithms @ 4 Data Structures @ 4 Distributed Systems @ 4 AWS @ 4 Azure @ 4 Communication @ 4 Parallel Programming @ 4 Performance Optimization @ 4 Rust @ 1 Debugging @ 4 LLM @ 4 PyTorch @ 4 CUDA @ 3 GPU @ 4 Deep Learning @ 4 Observability @ 4 AI @ 4 Profiling @ 3 vLLM @ 4 NCCL @ 3 Slurm @ 3 SGLang @ 4 LLVM @ 4

Details

We are seeking highly skilled and motivated software engineers to build AI inference systems that serve large-scale models with extreme efficiency. You will architect and implement high-performance inference stacks, optimize GPU kernels and compilers, drive industry benchmarks, and scale workloads across multi-GPU, multi-node, and multi-cloud environments. You will collaborate across inference, compiler, scheduling, and performance teams to push the frontier of accelerated computing for AI.

Responsibilities

Contribute features to vLLM that empower the newest models with the latest NVIDIA GPU hardware features; profile and optimize the inference framework (vLLM) with methods like speculative decoding, data/tensor/expert/pipeline-parallelism, and prefill-decode disaggregation.
Develop, optimize, and benchmark GPU kernels (hand-tuned and compiler-generated) using techniques such as fusion, autotuning, and memory/layout optimization.
Build and extend high-level DSLs and compiler infrastructure to boost kernel developer productivity while approaching peak hardware utilization.
Define and build inference benchmarking methodologies and tools; contribute new benchmarks and NVIDIA's submissions to the MLPerf Inference benchmarking suite.
Architect scheduling and orchestration of containerized large-scale inference deployments on GPU clusters across clouds.
Conduct and publish original research that advances ML systems; survey recent publications and integrate research ideas and prototypes into NVIDIA's software products.

Requirements

Bachelor's degree (or equivalent experience) in Computer Science, Computer Engineering, or Software Engineering with 7+ years of experience; or Master's degree with 5+ years; or PhD with thesis and top-tier publications in ML Systems, GPU architecture, or high-performance computing.
Strong programming skills in Python and C/C++; experience with Go or Rust is a plus.
Solid CS fundamentals: algorithms & data structures, operating systems, computer architecture, parallel programming, distributed systems, and deep learning theory.
Experience with performance engineering in ML frameworks (e.g., PyTorch) and inference engines (e.g., vLLM and SGLang).
Familiarity with GPU programming and performance (CUDA, memory hierarchy, streams, NCCL); proficiency with profiling/debug tools such as Nsight Systems and Nsight Compute.
Experience with containers and orchestration (Docker, Kubernetes, Slurm); familiarity with Linux namespaces and cgroups.
Excellent debugging, problem-solving, and communication skills; ability to excel in a fast-paced, multi-functional setting.

Ways to Stand Out

Experience building and optimizing LLM inference engines (e.g., vLLM, SGLang).
Hands-on work with ML compilers and DSLs (e.g., Triton, TorchDynamo/Inductor, MLIR/LLVM, XLA), GPU libraries (e.g., CUTLASS) and features (e.g., CUDA Graph, Tensor Cores).
Experience contributing to containerization/virtualization technologies such as containerd, CRI-O, or CRIU.
Experience with cloud platforms (AWS, GCP, Azure), infrastructure as code, CI/CD, and production observability.
Contributions to open-source projects and/or academic publications (include links to GitHub PRs and papers).

Company

At NVIDIA, the mission is to advance AI research and development and create technologies that enable anyone to harness the power of AI. The team includes experts in AI, systems, and performance optimization. The role is aimed at building systems, kernels, and tools to make large-scale AI faster, more efficient, and easier to deploy.

Location & Work Arrangement

Location: Germany (remote/hybrid indicated)
Note: job post includes #LI-Hybrid and lists location as "Germany, Remote".

Compensation (as listed in posting)

For Poland: Level 4 base salary range: 292,500 PLN - 507,000 PLN
For Poland: Level 5 base salary range: 375,000 PLN - 650,000 PLN