Used Tools & Technologies
Not specified
Required Skills & Competences ?
Go @ 7 Grafana @ 3 Kubernetes @ 4 Prometheus @ 3 Python @ 7 CI/CD @ 4 Datadog @ 3 Distributed Systems @ 8 Hiring @ 4 Mathematics @ 4 Performance Optimization @ 4 Debugging @ 4 API @ 4 System Architecture @ 7 OpenTelemetry @ 3 GPU @ 4Details
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. Today NVIDIA is tapping into the unlimited potential of AI to define the next era of computing. The Software Infrastructure Team in Santa Clara, CA — part of the NVIDIA AI Factory initiative — is building and maintaining the core infrastructure that powers closed and open source AI models. This role focuses on designing and developing an Inference as a Service platform to manage GPU resources, deliver high-performance, low-latency inference at scale, and ensure service stability.
Responsibilities
- Lead the design and development of a scalable, robust, and reliable platform for serving AI models (Inference as a Service).
- Architect and implement systems for dynamic GPU resource management, autoscaling, and efficient scheduling of inference workloads.
- Build and maintain core infrastructure components including load balancing and rate limiting to ensure stability and high availability.
- Define and implement APIs for model deployment, monitoring, and management to provide a seamless user experience.
- Optimize system performance and latency for various model types (large language models, computer vision models) to ensure high throughput and responsiveness.
- Collaborate with engineering teams to integrate deployment, monitoring, and performance telemetry into CI/CD pipelines.
- Develop tools and frameworks for real-time observability, performance profiling, and debugging of inference services.
- Drive architectural decisions and best practices for long-term platform evolution and scalability.
- Contribute to NVIDIA's AI Factory initiative by building foundational platform capabilities that support model serving needs.
Requirements
- 15+ years of software engineering experience with deep expertise in distributed systems or large-scale backend infrastructure.
- BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering or related fields (or equivalent experience).
- Strong programming skills in Python, Go, or C++ with a track record of building production-grade, highly available systems.
- Proven experience with container orchestration technologies like Kubernetes.
- Deep understanding of system architecture for high-performance, low-latency API services.
- Experience designing, implementing, and optimizing systems for GPU resource management.
- Familiarity with modern observability tools (for example: DataDog, Prometheus, Grafana, OpenTelemetry).
- Demonstrated experience with deployment strategies and CI/CD pipelines.
- Excellent problem-solving skills and ability to work in a fast-paced, collaborative environment.
Ways to stand out
- Experience with specialized inference serving frameworks.
- Open-source contributions to projects in AI/ML, distributed systems, or infrastructure.
- Hands-on experience with performance optimization techniques for AI models (e.g., quantization, model compression).
- Expertise in building platforms that support a wide variety of AI model architectures.
- Strong understanding of the full lifecycle of an AI model, from training to deployment and serving.
Compensation & Benefits
- Base salary range: 248,000 USD - 391,000 USD (determined based on location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits (see NVIDIA benefits).
Other details
- Location: Santa Clara, CA, United States (on-site).
- Employment type: Full time.
- Applications accepted at least until August 24, 2025.
NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. They do not discriminate in hiring or promotion practices on the basis of legally protected characteristics.