Used Tools & Technologies
Not specified
Required Skills & Competences
Tag name is followed by "@" symbol and proficiency level value.
About proficiency levels:
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Distributed Systems @ 7
Hiring @ 4
Leadership @ 8
Communication @ 4
Networking @ 4
Debugging @ 6
Technical Leadership @ 8
CUDA @ 4
AI @ 4
InfiniBand @ 4
NCCL @ 4
- 1-2 — basic awareness. Minimal hands-on experience, and a rudimentary understanding of the technology's purpose;
- 3-6 — daily use. Comfortable and regular usage, capable of handling common tasks and challenges related to the technology;
- 7-9 — you are an expert, you can teach others, you know all the pitfalls and tricks;
- 10 — exceptional knowledge, comprehensive understanding, and adeptness in all aspects of the technology, including advanced problem-solving. Think twice before claiming or demanding such level.
Details
NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. The company is focused on AI, computer graphics, and accelerated computing and is building networking solutions for AI deployments. As a principal-level engineer you will lead transformation of AI networking systems, manage complex customer engagements, and influence product and architecture direction for embedded networking products and their ecosystem.
Responsibilities
- Lead the technical strategy for AI Factory networking deployments at strategic customers, including architecture reviews, risk assessments, and multi-phase execution plans.
- Serve as the principal-level technical authority for embedded networking products like BlueField and ConnectX and the surrounding technology ecosystem (DOCA, RDMA, RoCE, Infiniband).
- Lead deep technical engagements with hyperscalers and AI Factory customers: design-in, coding, bring-up, performance tuning, failure analysis, and production hardening.
- Partner with internal engineering, product, and architecture teams to convert customer needs into product features, reference architectures, tooling, and guidelines.
- Drive performance, reliability, and debuggability improvements across customer stacks and translate findings into actionable product, firmware, and software roadmap items.
Requirements
- BS/MS/PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience.
- 15+ years of relevant industry experience, including technical leadership across complex systems.
- Deep knowledge of networking protocols and distributed systems; strong understanding of RoCE/Infiniband, L1–L4 fundamentals, and performance/latency tradeoffs.
- Proven low-level software expertise with proficiency in C/C++ and comfort debugging across firmware, driver, and user space.
- Demonstrated experience in high-performance networking and system-level debugging (packet drops, retransmissions, congestion, QoS, ordering, buffer management).
- Excellent interpersonal and communication skills; ability to explain complex topics to engineers, PMs, and customers and align cross-organizational teams toward decisions.
Ways To Stand Out
- Prior customer-facing technical leadership experience at hyperscalers/CSPs/AI factories or similarly complex production environments.
- Hands-on expertise with DPDK, DOCA, RDMA verbs, NCCL, CUDA-aware networking, congestion control, and performance tuning at scale.
- Experience building internal tools, telemetry, and automation to improve triage speed and operational excellence.
- Demonstrated innovation (patents, publications, rapid prototyping, shipping new architecture/features end-to-end).
- Experience leading multi-team initiatives across geographies and proactively using AI-powered tools to accelerate debugging and engineering efficiency while maintaining strong engineering judgment.
Compensation & Benefits
- Base salary range: 272,000 USD - 431,250 USD (base determined by location, experience, and pay of employees in similar positions).
- Eligibility for equity and a benefits package (link to NVIDIA benefits referenced in the posting).
Additional Information
- Applications for this job will be accepted at least until March 9, 2026.
- This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes.
- NVIDIA is an equal opportunity employer and values diversity in hiring and promotion practices.