Used Tools & Technologies
Not specified
Required Skills & Competences ?
Ansible @ 7 Go @ 4 Grafana @ 7 Kubernetes @ 4 Prometheus @ 7 DevOps @ 7 Terraform @ 7 Python @ 4 Machine Learning @ 4 Leadership @ 4 Networking @ 6 NLP @ 4 Splunk @ 7 LLM @ 4 Salt @ 7 GPU @ 4Details
NVIDIA redefines what’s possible. NVIDIA has been reinventing computer graphics, PC gaming, and accelerated computing for 30 years. It is a unique legacy of innovation fueled by great technology and amazing people. Today, NVIDIA taps into the unlimited potential of AI to define the next era of computing — where GPUs act as the brains of computers, generative AI, robots, and self-driving cars that understand the world.
Our company is at the forefront of technological innovation, dedicated to driving efficiency and optimizing infrastructure performance both on-prem and in the cloud. Join us in this exciting endeavor!
Responsibilities
- Architect and implement infrastructure platforms tailored for AI/ML workloads, focusing on scaling private cloud environments for high-throughput training, inference, and Agentic workflows/pipelines.
- Lead initiatives in Generative AI system design including Retrieval-Augmented Generation (RAG), LLM fine-tuning, semantic search, and multi-modal data processing.
- Build and optimize ML systems for document understanding, vector-based retrieval, and knowledge graph integration using advanced NLP and information retrieval techniques.
- Design and develop scalable services and tools supporting GPU-accelerated AI pipelines leveraging Kubernetes, Python/Go, and observability frameworks.
- Mentor and collaborate with a multidisciplinary team of network engineers, automation engineers, AI/ML scientists, product managers, and domain experts.
- Build and drive adoption of emerging AIOPs technologies by integrating AI Agents, RAGs, and LLMs using MCP workflows for automation, performance tuning, and large-scale data insights.
Requirements
- 10+ years of engineering experience with at least 5 years leading ML infrastructure, AI systems, or applied NLP/LLM development initiatives.
- 5+ years experience in Networking and infrastructure.
- Bachelor’s, Master’s, or Ph.D. in Computer Science, Engineering, Machine Learning, or related field (or equivalent experience).
- Deep expertise with generative AI concepts like embeddings, RAG, semantic search, transformer-based LLMs.
- Experience with MCP workflows and Agentic ecosystem.
- Knowledge of vector databases (FAISS, Pinecone, Weaviate) and data pipelines.
- Programming proficiency in Python (preferred) and/or Go, with software engineering best practices.
- Experience deploying and tuning LLMs using LoRA, QLoRA, instruction tuning.
- Strong understanding of infrastructure automation pipelines (Terraform, Ansible, Salt), monitoring (Prometheus, Grafana), and DevOps tools.
- Hands-on experience with petabyte-scale datasets, schema design, and distributed processing.
- Strong background in infrastructure-related data collections and logs related to network data; ability to run simulations of network state with AI tools.
Ways to Stand Out
- Experience building multi-hop RAG systems with self-consistency and chain-of-thought prompting.
- Leadership in designing AI platforms for large-scale enterprise search, document intelligence, or recommendation systems.
- Contributions to open-source ML/AI tools or active participation in AI research community.
- Familiarity with knowledge graph construction and reasoning systems; demonstrated ability to communicate complex ML concepts to stakeholders.
- Strong knowledge of automation pipeline and observability tools like BigPanda, Splunk, Storm, Netbox/Nautobot, and open-source automation tools.
- Strong knowledge of network operating systems like Arista EOS, Cumulus, Cisco NX-OS, Sonic, SRLinux and expertise in Infrastructure or Network as Code automation frameworks.
NVIDIA is committed to a diverse work environment and is proud to be an equal opportunity employer. Creative and autonomous individuals are encouraged to apply.
Salary
Base salary range: $200,000 - $391,000 USD per year, depending on location, experience, and peer pay. Eligible for equity and benefits.
#LI-Hybrid