Used Tools & Technologies
Not specified
Required Skills & Competences ?
Ansible @ 7 Go @ 4 Grafana @ 7 Kubernetes @ 4 Prometheus @ 7 DevOps @ 7 Terraform @ 7 Python @ 4 Machine Learning @ 4 Leadership @ 4 Networking @ 6 NLP @ 4 Splunk @ 4 LLM @ 4 Salt @ 7 GPU @ 4Details
NVIDIA is a leader in technological innovation, reinventing computer graphics, PC gaming, and accelerated computing for over 30 years. Currently, NVIDIA is focused on harnessing AI's unlimited potential to define the next era of computing, powering products such as generative AI, robots, and self-driving cars.
Responsibilities
- Architect and implement infrastructure platforms tailored for AI/ML workloads, focusing on scaling private cloud environments to support high-throughput training, inference, and Agentic workflows and pipelines.
- Lead initiatives in Generative AI systems design, including Retrieval-Augmented Generation (RAG), LLM fine-tuning, semantic search, and multi-modal data processing.
- Build and optimize ML systems for document understanding, vector-based retrieval, and knowledge graph integration using advanced NLP and information retrieval techniques.
- Design and develop scalable services and tools to support GPU-accelerated AI pipelines, utilizing Kubernetes, Python/Go, and observability frameworks.
- Mentor and collaborate with a multidisciplinary team including network engineers, automation engineers, AI and ML scientists, product managers, and other domain experts.
- Build and drive adoption of emerging AIOPs technologies integrating AI Agents, RAGs, and LLMs via MCP workflows to streamline automation, performance tuning, and large-scale data insights.
Requirements
- Over 10 years of engineering experience with at least 5 years leading ML infrastructure, AI systems, or applied NLP/LLM development initiatives.
- More than 5 years in Networking and infrastructure.
- Bachelor’s, Master’s, or Ph.D. in Computer Science, Engineering, Machine Learning, or related fields (or equivalent experience).
- Deep expertise in generative AI concepts such as embeddings, RAG, semantic search, transformer-based LLMs, MCP workflows, Agentic ecosystem.
- Experience with vector databases like FAISS, Pinecone, Weaviate, and data pipelines.
- Programming proficiency in Python (preferred) and/or Go with software engineering best practices.
- Experience deploying and tuning LLMs using techniques like LoRA, QLoRA, and instruction tuning.
- Strong knowledge of infrastructure automation tools including Terraform, Ansible, Salt; monitoring tools like Prometheus, Grafana; and DevOps practices.
- Hands-on experience handling petabyte-scale datasets, schema design, and distributed processing.
- Proficiency working with infrastructure-related data collections and network logs including running AI-based network state simulations.
Desired Qualifications
- Experience building multi-hop RAG systems with self-consistency and chain-of-thought prompting.
- Leadership in designing AI platforms for enterprise search, document intelligence, or recommendation systems.
- Contributions to open-source ML/AI tools or active participation in the AI research community.
- Familiarity with knowledge graph construction, reasoning systems, and conveying complex ML concepts to executives and cross-functional teams.
- Knowledge of automation pipeline, observability tools like BigPanda, Splunk, Storm, Netbox/Nautobot, and other open-source automation tooling.
- Expertise with network operating systems such as Arista EOS, Cumulus, Cisco NX-OS, Sonic, SRLinux, and excellence in infrastructure/network-as-code automation frameworks.
NVIDIA values creativity, autonomy, and a diverse, supportive work environment, encouraging applications from talented individuals eager to innovate in AI and networking.
Salary and Benefits
The base salary range for this role is $248,000 - $391,000 USD annually, dependent on location, experience, and comparable employee pay. Additional equity and benefits offered.
#LI-Hybrid