Used Tools & Technologies
Not specified
Required Skills & Competences ?
Kubernetes @ 3 Prometheus @ 3 Python @ 7 Statistics @ 4 Algorithms @ 4 Distributed Systems @ 3 Machine Learning @ 4 TensorFlow @ 4 AWS @ 3 Communication @ 4 Mathematics @ 4 Networking @ 4 PyTorch @ 4 GPU @ 7Details
As a Senior Machine Learning Engineer at NVIDIA, you will build the machine learning brain that keeps NVIDIA’s global DGX Cloud healthy, efficient and ready for the next waves of AI breakthroughs. DGX Cloud fuses NVIDIA GPUs, NVLink networking and the full AI software stack into elastic infrastructure powering large language models, drug discovery, autonomous driving and climate science. Your models will turn billions of telemetry signals into predictive insight, enabling customers to innovate while the platform runs smarter.
Responsibilities
- Groundbreaking and developing innovative machine learning algorithms and models that propel AI products.
- Build production models for anomaly detection, predictive maintenance and usage optimization.
- Develop tools surfacing real-time telemetry, efficiency metrics and long-term trends.
- Develop forecasting and simulation models for global scale planning.
- Analyze complex datasets to determine the best approach for model training and optimization.
- Translate findings into clear engineering actions with infrastructure, operations and product teams.
- Participate in cross-functional projects to integrate machine learning capabilities into various NVIDIA products.
Requirements
- Master’s degree or PhD in Mathematics, Statistics, Machine Learning or related quantitative field (or equivalent experience).
- 8+ years experience applying machine learning to operational systems.
- Proven track record of building and deploying machine learning models in production environments.
- Experience with time series analysis and optimization algorithms.
- Familiarity with distributed systems and cloud platforms such as AWS and Kubernetes.
- Strong software engineering skills and proficiency in Python.
- Effective verbal/written communication and technical presentation skills.
- Experience with machine learning frameworks such as TensorFlow, PyTorch, or similar.
- A track record of delivering high-impact projects in a fast-paced environment.
Preferred / Ways to stand out
- Experience solving capacity planning problems.
- Deep understanding of GPU performance metrics.
- Familiarity with Prometheus and PromQL.
Compensation & Benefits
- Base salary range: 184,000 USD - 287,500 USD (will be determined based on location, experience, and pay of employees in similar positions).
- Eligible for equity and benefits (see https://www.nvidia.com/en-us/benefits/).
- Applications accepted at least until September 27, 2025.
Additional
NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. The company does not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.