Used Tools & Technologies
Not specified
Required Skills & Competences ?
Software Development @ 8 Ceph @ 3 Chef @ 3 Docker @ 8 ElasticSearch @ 4 Kafka @ 3 Kubernetes @ 3 Linux @ 4 MySQL @ 4 Python @ 7 SQL @ 4 Java @ 7 NoSQL @ 4 Algorithms @ 4 Distributed Systems @ 4 Machine Learning @ 4 Git @ 3 MongoDB @ 4 OpenStack @ 3 Android @ 4 API @ 4 Hadoop @ 3 Puppet @ 3 Cassandra @ 4Details
NVIDIA is seeking an AI Solutions Architect to join its Infrastructure Planning and Process (IPP) team to support extensive scale-up of AI solutions for NVIDIA's internal cloud infrastructure. IPP is a global organization that partners with teams across Graphics, Mobile, Deep Learning, AI, and Driverless Cars to meet infrastructure needs. The cloud services support nearly half a million automated jobs daily across thousands of servers and host a diverse mix of machines and hardware platforms (Windows/Linux/Android, NVIDIA GPUs, Tegra processors).
Responsibilities
- Serve as an architect developing internal AI systems used by thousands of NVIDIANs globally.
- Manage and improve the tools NVIDIANs use to deliver solutions quickly; identify gaps in tooling and determine buy vs. build options.
- Understand data movement across the platform, identify bottlenecks, define solutions, develop key components, write APIs, and own deployments.
- Collaborate with internal and external development teams to discover opportunities and solve complex problems.
- Guide and mentor engineers and sub-system leads; develop acceptance tests and review engineer work and test results.
- Identify performance bottlenecks and optimize speed and cost efficiency of AI development and testing systems.
- Drive planning of software/hardware capacity for internal and public cloud environments, balancing time and utilization.
- Introduce technologies enabling massively parallel systems to improve turnaround times by orders of magnitude.
- Collaborate with AI product vendors to gain industry insights and share them with internal leaders and developers.
Requirements
- BS in EE/CS or equivalent experience with 10+ years of systems software development and at least 1 year working with AI.
- Development experience with Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), fine-tuning LLMs, AI agentic workflows, LangChain, LangGraphs, and cascading models.
- Experience deploying in hybrid and multi-cloud architectures and working with edge computing.
- Extensive experience architecting and shipping large-scale distributed software systems and designing for high-performance, scalable systems.
- Strong ability to identify gaps and bottlenecks and develop solutions to optimize performance at large scale (petabytes of storage, millions of cores, massive CI/test workloads).
- Strong programming and software development skills in Java, Python, and shell scripting, plus a solid understanding of distributed systems and REST APIs.
- Experience with SQL/NoSQL database systems such as MySQL, Cassandra, MongoDB, or Elasticsearch.
- Excellent knowledge and working experience with Docker containers and virtual machines.
- Familiarity with cloud & platform technologies such as OpenStack, Docker, Kubernetes, Chef/Puppet, Hadoop/Ceph/SwiftStack, LXC, Git, Perforce, JFrog, and Kafka.
- Ability to work across organizational boundaries in a multi-national, multi-time-zone corporate environment.
Ways to stand out
- MS or PhD in EE/CS.
- Deeper expertise in AI, machine learning, and deep learning algorithms and techniques.
- Strong collaborative and interpersonal skills with a record of guiding and influencing others in dynamic environments.
- Experience developing large-scale service-oriented architectures under real-time performance requirements.
- Background in designing high-performance, cost-optimized software systems.
Benefits & Additional information
- Competitive base salary (range depends on level and location).
- Eligible for equity and benefits.
- Application window: at least until July 29, 2025.
Salary details
Base salary ranges provided by level:
- Level 4: 184,000 USD - 287,500 USD
- Level 5: 224,000 USD - 356,500 USD
NVIDIA determines base salary based on location, experience, and pay of employees in similar positions. You will also be eligible for equity and benefits.