Senior Software Engineer - Inference as a Service

at Nvidia
USD 200,000-391,000 per year
SENIOR
✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Go @ 7 Grafana @ 3 Kubernetes @ 4 Prometheus @ 3 Python @ 7 CI/CD @ 4 Datadog @ 3 Distributed Systems @ 8 Mathematics @ 4 Performance Optimization @ 4 Debugging @ 4 API @ 4 System Architecture @ 7 OpenTelemetry @ 3 GPU @ 4

Details

NVIDIA's Software Infrastructure Team is building and maintaining the core infrastructure that powers closed and open source AI models. This role focuses on the Inference as a Service platform: developing systems to manage GPU resources, ensure service stability, and deliver high-performance, low-latency inference at massive scale. The team is part of the NVIDIA AI Factory initiative and is based in Santa Clara, CA.

Responsibilities

  • Design and develop a scalable, robust, and reliable platform for serving AI models (Inference as a Service).
  • Develop and implement systems for dynamic GPU resource management, autoscaling, and efficient scheduling of inference workloads.
  • Build and maintain core infrastructure components including load balancing and rate limiting to ensure stability and high availability.
  • Implement APIs for model deployment, monitoring, and management to provide a seamless user experience.
  • Integrate deployment, monitoring, and performance telemetry into CI/CD pipelines in collaboration with engineering teams.
  • Build tools and frameworks for real-time observability, performance profiling, and debugging of inference services.
  • Collaborate with architects to define and implement best practices for long-term platform evolution.
  • Contribute to NVIDIA's AI Factory initiative by building foundational platforms that support model serving needs.

Requirements

  • BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering or related fields (or equivalent experience).
  • 12+ years of software engineering experience with expertise in distributed systems or large-scale backend infrastructure.
  • Strong programming skills in Python, Go, or C++ with a track record of building production-grade, highly available systems.
  • Proven experience with container orchestration technologies such as Kubernetes.
  • Strong understanding of system architecture for high-performance, low-latency API services.
  • Experience designing, implementing, and optimizing systems for GPU resource management.
  • Familiarity with modern observability tools (examples given: DataDog, Prometheus, Grafana, OpenTelemetry).
  • Demonstrated experience with deployment strategies and CI/CD pipelines.
  • Excellent problem-solving skills and ability to work in a fast-paced, collaborative environment.

Preferred / Ways to Stand Out

  • Experience with specialized inference serving frameworks.
  • Open-source contributions to projects in AI/ML, distributed systems, or infrastructure.
  • Hands-on experience with performance optimization techniques for AI models, such as quantization or model compression.
  • Expertise in building platforms that support a wide variety of AI model architectures.
  • Strong understanding of the full lifecycle of an AI model, from training to deployment and serving.

Compensation & Benefits

  • Base salary ranges (location- and level-dependent):
    • Level 5: 200,000 USD - 322,000 USD
    • Level 6: 248,000 USD - 391,000 USD
  • You will also be eligible for equity and benefits.

Additional Information

  • Location: Santa Clara, CA, United States.
  • Employment type: Full time.
  • Applications for this job will be accepted at least until August 26, 2025.
  • NVIDIA is an equal opportunity employer and committed to fostering a diverse work environment.