Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Security @ 4
Docker @ 4
Kubernetes @ 4
Python @ 4
GCP @ 7
Algorithms @ 7
AWS @ 7
Azure @ 7
Helm @ 4
Rust @ 4
Microservices @ 4
API @ 4
LLM @ 4
Compliance @ 4
GPU @ 4
Observability @ 4
AI @ 4
Agentic AI @ 4
vLLM @ 4
Data Pipelines @ 4
TensorRT @ 4
SGLang @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA seeks a senior engineer to own and evolve the core NIM Platform SDK and microservice framework that powers NVIDIA Inferencing Microservices (NIM). This is a hands-on, deeply technical role focused on building foundational platform libraries and production-grade microservices for AI inference at scale across cloud, on-prem, and Kubernetes environments.
Responsibilities
- Develop and advance the inference microservice framework: OpenAI-compatible API endpoints, inference backend integrations (vLLM, SGLang, TensorRT-LLM, Dynamo), middleware, observability instrumentation, and production hardening across cloud, on-prem, and Kubernetes.
- Architect significant new features in open-source codebases and shepherd them through project acceptance into production.
- Build and optimize high-performance model download and caching pipelines across multiple cloud storage backends (NGC, HuggingFace, S3, GCS) including parallel transfers and integrity verification for multi-cloud operability.
- Implement model profile and manifest systems to ensure NIMs are optimized for every NVIDIA GPU platform (profile selection, validation, multi-GPU configuration).
- Develop and refine cloud microservice patterns: service discovery, health checking, graceful degradation, API gateway integration, and end-to-end request lifecycle management for reliable operation at scale.
- Produce high-quality code across Python, Rust, and C/C++, and champion practices such as test-driven development, agentic AI-assisted development, and rigorous code review.
- Mentor teammates and establish engineering standards for container quality, security, and operability.
Requirements
- BS or MS in Computer Science, Computer Engineering, or related field (or equivalent experience).
- 8+ years of demonstrated experience developing performant microservice, cloud software, and/or platform infrastructure.
- Deep technical expertise in cloud-native microservice architecture, including service mesh, API gateways, load balancing, and distributed system patterns.
- Expertise in high-performance data pipelines with parallel I/O, caching strategies, and integrity verification across distributed storage systems.
- Solid understanding of containerized application delivery (Docker, Kubernetes, Helm).
- Understanding of application security principles: secure coding, vulnerability mitigation, secrets management, and supply chain integrity for containerized environments.
- Strong problem-solving skills grounded in first-principles reasoning.
- Excellent programming skills in Python and Rust, with strong foundations in algorithms and software engineering principles.
Ways to stand out
- Direct involvement in open-source inference backends such as vLLM, TRTLLM, or SGLang.
- Direct involvement in disaggregated serving frameworks like NVIDIA Dynamo.
- Experience building and operating production microservices at scale.
- Deep knowledge of multi-cloud deployment strategies across AWS, GCP, Azure, and OCI.
- Experience operating in regulated, air-gapped, or disconnected environments with strict security and compliance controls.
Compensation & Benefits
- Base salary range: 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
- Eligible for equity and company benefits (link provided in posting).
Additional Information
- Applications accepted at least until April 4, 2026.
- This posting is for an existing vacancy.
- NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer committed to fostering a diverse work environment.