Used Tools & Technologies
Not specified
Required Skills & Competences ?
Go @ 7 Grafana @ 3 Kubernetes @ 4 Prometheus @ 3 Python @ 7 CI/CD @ 4 Datadog @ 3 Distributed Systems @ 8 Mathematics @ 4 Performance Optimization @ 4 Debugging @ 4 API @ 4 System Architecture @ 7 OpenTelemetry @ 3 GPU @ 4Details
NVIDIA's Software Infrastructure Team is building and maintaining the core infrastructure that powers closed and open source AI models. This role focuses on the Inference as a Service platform: developing systems to manage GPU resources, ensure service stability, and deliver high-performance, low-latency inference at massive scale. The team is part of the NVIDIA AI Factory initiative and is based in Santa Clara, CA.
Responsibilities
- Design and develop a scalable, robust, and reliable platform for serving AI models (Inference as a Service).
- Develop and implement systems for dynamic GPU resource management, autoscaling, and efficient scheduling of inference workloads.
- Build and maintain core infrastructure components including load balancing and rate limiting to ensure stability and high availability.
- Implement APIs for model deployment, monitoring, and management to provide a seamless user experience.
- Integrate deployment, monitoring, and performance telemetry into CI/CD pipelines in collaboration with engineering teams.
- Build tools and frameworks for real-time observability, performance profiling, and debugging of inference services.
- Collaborate with architects to define and implement best practices for long-term platform evolution.
- Contribute to NVIDIA's AI Factory initiative by building foundational platforms that support model serving needs.
Requirements
- BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering or related fields (or equivalent experience).
- 12+ years of software engineering experience with expertise in distributed systems or large-scale backend infrastructure.
- Strong programming skills in Python, Go, or C++ with a track record of building production-grade, highly available systems.
- Proven experience with container orchestration technologies such as Kubernetes.
- Strong understanding of system architecture for high-performance, low-latency API services.
- Experience designing, implementing, and optimizing systems for GPU resource management.
- Familiarity with modern observability tools (examples given: DataDog, Prometheus, Grafana, OpenTelemetry).
- Demonstrated experience with deployment strategies and CI/CD pipelines.
- Excellent problem-solving skills and ability to work in a fast-paced, collaborative environment.
Preferred / Ways to Stand Out
- Experience with specialized inference serving frameworks.
- Open-source contributions to projects in AI/ML, distributed systems, or infrastructure.
- Hands-on experience with performance optimization techniques for AI models, such as quantization or model compression.
- Expertise in building platforms that support a wide variety of AI model architectures.
- Strong understanding of the full lifecycle of an AI model, from training to deployment and serving.
Compensation & Benefits
- Base salary ranges (location- and level-dependent):
- Level 5: 200,000 USD - 322,000 USD
- Level 6: 248,000 USD - 391,000 USD
- You will also be eligible for equity and benefits.
Additional Information
- Location: Santa Clara, CA, United States.
- Employment type: Full time.
- Applications for this job will be accepted at least until August 26, 2025.
- NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.