Used Tools & Technologies
Not specified
Required Skills & Competences ?
Ansible @ 7 Go @ 4 Grafana @ 7 Kubernetes @ 4 Prometheus @ 7 DevOps @ 7 Terraform @ 7 Python @ 4 Machine Learning @ 4 Leadership @ 4 Networking @ 4 NLP @ 4 Splunk @ 7 LLM @ 4 Salt @ 7 GPU @ 4Details
NVIDIA is seeking a Senior AI/ML Engineer to build the next generation of IT networking capabilities by applying AI to networking problems and leading a technology transformation to run AI on-prem. The role focuses on architecting and implementing infrastructure platforms for AI/ML workloads, integrating enterprise-ready platforms, automation, and driving adoption of AIOps technologies.
Responsibilities
- Architect and implement infrastructure platforms tailored for AI/ML workloads, focusing on scaling private cloud environments to support high-throughput training, inference, and agentic workflows and pipelines.
- Lead initiatives in generative AI systems design, including Retrieval-Augmented Generation (RAG), LLM fine-tuning, semantic search, and multi-modal data processing.
- Build and optimize ML systems for document understanding, vector-based retrieval, and knowledge graph integration using advanced NLP and information retrieval techniques.
- Design and develop scalable services and tools to support GPU-accelerated AI pipelines, leveraging Kubernetes, Python/Go, and observability frameworks.
- Mentor and collaborate with a multidisciplinary team of network engineers, automation engineers, AI/ML scientists, product managers, and domain experts.
- Build and drive adoption of emerging AIOps technologies, integrating AI Agents, RAGs, and LLMs using MCP workflows to streamline automation, performance tuning, and large-scale data insights.
Requirements
- 10+ years of engineering experience with at least 5 years leading initiatives in ML infrastructure, AI systems, or applied NLP/LLM development.
- 5+ years of experience in networking and infrastructure.
- Bachelor’s, Master’s, or Ph.D. in Computer Science, Engineering, Machine Learning, or a related field (or equivalent experience).
- Deep expertise with:
- Generative AI concepts: embeddings, RAG, semantic search, transformer-based LLMs.
- MCP workflows and the agentic ecosystem.
- Vector databases and data pipelines (examples listed: FAISS, Pinecone, Weaviate).
- Programming in Python (preferred) and/or Go, and software engineering best practices.
- Experience deploying and tuning LLMs using techniques such as LoRA, QLoRA, and instruction tuning.
- Strong understanding of infrastructure automation pipelines (Terraform, Ansible, Salt), monitoring (Prometheus, Grafana), and DevOps tools.
- Hands-on experience working with petabyte-scale datasets, schema design, and distributed processing.
- Strong background with infrastructure-related data collections and logs related to network data, and ability to run simulations of network state with AI tools.
Ways to Stand Out From the Crowd
- Experience building multi-hop RAG systems with self-consistency and chain-of-thought prompting.
- Prior leadership in designing AI platforms for large-scale enterprise search, document intelligence, or recommendation systems.
- Contributions to open-source ML/AI tools or active participation in the AI research community.
- Familiarity with knowledge graph construction and reasoning systems and demonstrated ability to communicate complex ML concepts to executive and cross-functional stakeholders.
- Strong knowledge of automation pipeline and observability tools such as BigPanda, Splunk, Storm, NetBox/Nautobot and other open-source automation tooling.
- Strong knowledge of network operating systems like Arista EOS, Cumulus, Cisco NX-OS, Sonic, SRLinux and excellence in Infrastructure/Network as Code automation frameworks.
Compensation & Benefits
- Base salary ranges by level:
- Level 5: 200,000 USD - 322,000 USD
- Level 6: 248,000 USD - 391,000 USD
- You will also be eligible for equity and benefits (see NVIDIA benefits).
Additional Information
- Location: Santa Clara, CA, United States
- Employment type: Full time
- Office policy: Hybrid (#LI-Hybrid)
- Applications accepted at least until July 29, 2025
- NVIDIA is an equal opportunity employer committed to diversity and inclusion.