Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Go @ 6
Python @ 6
Distributed Systems @ 7
Rust @ 6
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
We are now looking for a Senior Software Engineer for Generative AI Research at NVIDIA. The Cosmos infrastructure team builds systems to train Cosmos, NVIDIA’s world foundation model for physical AI. This role focuses on infrastructure for large-scale model training, data pipelines, simulation-driven synthetic data, and real-time reasoning for robots and autonomous systems.
Responsibilities
- Design, build, and operate scalable infrastructure for training Cosmos and supporting large-scale data pipelines
- Develop high-throughput systems for data processing, retrieval, and workflow orchestration
- Collaborate across research, optimization, and platform teams to accelerate experiments and deployments
- Improve system reliability, performance, and observability across distributed compute environments
- Contribute to long-term infrastructure strategy for training, data management, and large-scale compute efficiency
Requirements
- Master’s degree in Computer Science, Computer Engineering, a related STEM degree, or equivalent experience
- 6 years of relevant work experience
- Strong engineering background in distributed systems, ML infrastructure, or large-scale compute/data platforms
- Proficiency in Python and at least one systems language (examples listed: C++, Go, Rust)
- Experience with orchestration systems, scheduling, and scalable storage or data pipelines
- Ability to work across teams, drive technical clarity, and deliver robust solutions in complex environments
- Comfortable bridging research workflows and production-grade systems
Ways to Stand Out
- Experience building or optimizing infrastructure for large-scale model training
- Hands-on work with distributed compute environments or high-performance systems
- Familiarity with synthetic data, simulation pipelines, or large multimodal datasets
- Contributions to open-source infrastructure or large-scale internal tooling
Compensation & Benefits
- Base salary ranges provided by level:
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
- Eligible for equity and company benefits
Other Details
- Location: Santa Clara, CA, United States
- Employment type: Full time
- Applications accepted at least until December 27, 2025
Equal Opportunity
NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. They do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.