Senior GPU Supercomputer Scheduler Engineer
at Nvidia
📍 Santa Clara, United States
$148,000-276,000 per year
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Docker @ 4 Go @ 4 Linux @ 4 Python @ 4 TensorFlow @ 4 Bash @ 4 Communication @ 4 PyTorch @ 4Details
NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing. NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to solve, that only we can take on, and that matter to the world. This is our life’s work, to amplify human imagination and intelligence. Join us today!
Responsibilities
- Design and develop enhancements to the HPC batch scheduler(s).
- Work extensively with HPC scheduler vendor on bug fixes and feature releases
- Provide support to staff and end users to resolve batch scheduler issues
- Build and improve our ecosystem around GPU-accelerated computing
- Performance analysis and optimizations of deep learning workflows
- Develop large scale automation solutions
- Root cause analysis and suggest corrective action for problems large and small scales
- Finding and fixing problems before they occur
Requirements
- Bachelor’s degree in Computer Science, Electrical Engineering or related field or equivalent experience
- 5+ years of work experience
- Strong understanding of HPC batch schedulers, such as Slurm or LSF and HPC workflows that use MPI
- Significant experience in Programming in C/C++ and advanced scripting in languages such as Python, Go, bash scripting
- Established experience in Linux operating system, environment and tools
- Accomplished in computer architecture and operating systems
- Experience analyzing and tuning performance for a variety of HPC workloads
- In-depth understanding of container technologies like Docker, Singularity, Podman
- Flexibility/adaptability for working in a dynamic environment with different frameworks and requirements
- Excellent communication, interpersonal and customer collaboration skills
Ways to stand out from the crowd:
- Knowledge in MPI and High-performance computing
- Background in RDMA technology
- Open Source Software Contribution
- Experience with deep learning frameworks like PyTorch and TensorFlow
- Passionate about SW development processes