Senior Machine Learning Applications and Compiler Engineer, LPX
at Nvidia
USD 152,000-287,500 per year
Used Tools & Technologies
GPURequired Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Algorithms @ 4
Data Structures @ 7
Machine Learning @ 4
TensorFlow @ 3
Communication @ 4
Rust @ 7
Debugging @ 7
PyTorch @ 3
Deep Learning @ 4
AI @ 4
Profiling @ 7
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are seeking a Senior Machine Learning Applications and Compiler Engineer to develop algorithms and optimizations for the LPX inference and compiler stack. You will work at the intersection of large-scale systems, compilers, and deep learning to map neural network workloads onto NVIDIA platforms.
Responsibilities
- Build, develop, and maintain high-performance runtime and compiler components, focusing on end-to-end inference optimization.
- Define and implement mappings of large-scale inference workloads onto NVIDIA’s systems.
- Extend and integrate with NVIDIA’s software ecosystem, contributing to libraries, tooling, and interfaces for model deployment across platforms.
- Benchmark, profile, and monitor performance and efficiency metrics to ensure efficient mappings of neural network graphs to inference hardware.
- Collaborate with hardware architects and design teams to provide software feedback, influence future architectures, and codesign features to improve performance and efficiency.
- Prototype and evaluate compilation and runtime techniques, including graph transformations, scheduling strategies, and memory/layout optimizations for spatial processors.
- Publish and present technical work on compilation approaches for inference and spatial accelerators at ML, compiler, and architecture venues.
Requirements
- MS or PhD in Computer Science, Electrical/Computer Engineering, or related field, or equivalent experience, with 5 years of relevant experience.
- Strong software engineering background with proficiency in systems-level programming (e.g., C/C++ and/or Rust) and solid CS fundamentals in data structures, algorithms, and concurrency.
- Hands-on experience with compiler or runtime development, including IR design, optimization passes, or code generation.
- Experience with LLVM and/or MLIR, including building custom passes, dialects, or integrations.
- Familiarity with deep learning frameworks such as TensorFlow and PyTorch, and experience working with portable graph formats such as ONNX.
- Solid understanding of parallel and heterogeneous compute architectures (GPUs, spatial accelerators, or other domain-specific processors).
- Strong analytical and debugging skills, with experience using profiling, tracing, and benchmarking tools to drive performance improvements.
- Excellent communication and collaboration skills to work across hardware, systems, and software teams.
- Ideal candidates will have direct experience with MLIR-based compilers or other multilevel IR stacks, especially for graph-based deep learning workloads.
Ways to stand out
- Prior work on spatial or dataflow architectures, including static scheduling, pipeline parallelism, or tensor parallelism at scale.
- Contributions to open-source ML frameworks, compilers, or runtime systems, particularly related to performance or scalability.
- Demonstrated research impact (publications or presentations at PLDI, CGO, ASPLOS, ISCA, MICRO, MLSys, NeurIPS, or similar).
- Experience with large-scale AI distributed inference or training systems, including performance modeling and capacity planning for multi-rack deployments.
Compensation & Benefits
- Base salary ranges (location- and level-dependent):
- Level 3: 152,000 USD - 241,500 USD
- Level 4: 184,000 USD - 287,500 USD
- Eligible for equity and benefits (link to benefits provided in original posting).
Additional information
- Office policy: #LI-Hybrid (hybrid role).
- Application deadline: Applications accepted at least until March 23, 2026.
- NVIDIA uses AI tools in recruiting processes and is an equal opportunity employer committed to diversity and inclusion.