Used Tools & Technologies
Not specified
Required Skills & Competences ?
Algorithms @ 4 Data Structures @ 4 Communication @ 4 Parallel Programming @ 4 Prioritization @ 4 CUDA @ 4 GPU @ 6Details
We are seeking a Principal Developer Technology Engineer to research and develop techniques to accelerate large application workloads on advanced computer architectures. The role focuses on investigating and eliminating system bottlenecks to achieve optimal performance on state-of-the-art CPU and GPU hardware, partnering with the developer community and internal teams to influence next-generation programming models, software, and architectures.
Responsibilities
- Research and develop techniques to accelerate top CSP workloads on NVIDIA's computing platform, including advanced CPUs, GPUs, and interconnects.
- Work directly with key customers to perform in-depth analysis and optimization of complex workloads to ensure the best possible performance on current and next-generation hardware.
- Collaborate with libraries, tools, system software architecture, hardware, and research teams to influence the design of next-generation programming models, software, and architectures.
- Investigate performance of customer applications, design parallel algorithms, and implement optimizations in a GPU-accelerated computing environment.
- Publish findings in developer blogs or present at relevant conferences and workshops as a recognized expert and representative of NVIDIA.
Requirements
- Master’s degree in Computer Science, Computer Engineering, or a related computationally focused science degree (or equivalent experience).
- 10+ years of relevant work experience or research.
- Programming proficiency in C/C++ with a deep understanding of software design, programming techniques, and algorithms.
- Background in parallel programming, ideally CUDA C/C++.
- Hands-on experience performing low-level performance optimizations.
- In-depth expertise with CPU and GPU architecture fundamentals.
- Strong math skills, including linear algebra, for problem-solving and performance modeling.
- Good communication, organization, and prioritization skills.
Preferred / Ways to Stand Out
- Designed highly optimal parallel algorithms and data structures for applications with high bytes-to-compute ratio (for example, processing directly on compressed data and kernel fusion).
- Optimized end-to-end performance of applications spanning many layers of software, from OS to high-level frameworks.
- Influenced hardware feature design leveraging application and domain knowledge.
Compensation & Benefits
- Base salary range: 272,000 USD - 425,500 USD (base salary will be determined based on location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits (link provided in original posting).
Additional Information
- Application window open at least until July 29, 2025.
- NVIDIA emphasizes diversity and is an equal opportunity employer.
- Note: #LI-Hybrid (role indicated as hybrid).