Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Docker @ 4
Go @ 6
Kubernetes @ 4
Linux @ 3
Python @ 6
GCP @ 4
Distributed Systems @ 4
AWS @ 4
Azure @ 4
Communication @ 4
JavaScript @ 3
PostgreSQL @ 4
Next.js @ 3
React @ 3
Angular @ 3
Debugging @ 7
API @ 4
GPU @ 4
AI @ 4
Data Pipelines @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
Join NVIDIA's DGX Cloud team to build foundational systems for high-performance GPU infrastructure. You'll play a technical lead role designing scalable cloud services that integrate GPU telemetry from datacenters and enable operational automation across global cloud operations.
Responsibilities
- Act as technical lead for a team of software engineers designing cloud services backed by databases and data warehouses.
- Design and develop RESTful APIs to ingest telemetry from AI datacenters.
- Build scalable cloud services for high-volume ingestion, processing, and storage of large datasets.
- Build and manage data pipelines for online and offline data storage.
- Collaborate across teams to codify business processes into scalable, self-measuring systems.
- Optimize the reliability and efficiency of cloud services and operations.
- Lead and ship impactful technical projects, ensuring quality and scalability at every stage.
Requirements
- At least 12+ years of industry experience with a Bachelor’s or Master’s degree (or equivalent experience); PhD preferred.
- Expertise in building scalable REST APIs backed by PostgreSQL-compatible data stores.
- Proficiency in programming languages such as Go or Python.
- Familiarity with modern JavaScript frameworks (for example, React, Angular, Next.js).
- Expertise in cloud infrastructure (AWS, GCP, Azure) and container technologies like Docker and Kubernetes.
- Expertise with high-scale distributed systems, including architectural patterns for APIs and data pipelines.
- Outstanding communication and collaboration skills focused on solving complex operational challenges.
- Familiarity with Linux operating systems.
Ways to Stand Out
- Track record of leading engineers to successful delivery and operations of high-performance cloud services at Internet scale.
- Experience operating NVIDIA datacenter GPUs.
- Strong debugging and problem-solving skills in distributed environments.
Compensation & Additional Information
- Base salary ranges by level: Level 5 — 224,000 USD to 356,500 USD; Level 6 — 272,000 USD to 431,250 USD.
- Eligible for equity and benefits.
- Applications accepted at least until April 24, 2026.
- NVIDIA uses AI tools in its recruiting processes and is an equal opportunity employer committed to diversity.