Vacancy is archived. Applications are no longer accepted.

Principal Data Scientist, Accelerated Apache Spark

at Nvidia

📍 Santa Clara, United States

SENIOR

✅ On-site

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Software Development @ 3 Python @ 4 Spark @ 4 Statistics @ 4 Algorithms @ 4 Machine Learning @ 4 Data Science @ 4 scikit-learn @ 4 TensorFlow @ 4 LLM @ 4 PyTorch @ 4 XGBoost @ 4 Agile @ 3 Pandas @ 4

Details

NVIDIA is looking for a Principal Data Scientist to join the GPU accelerated Apache Spark team. Data scientists spend a considerable amount of time exploring data, iterating over machine learning (ML) experiments. Apache Spark is the most popular data processing engine in data centers for data science. It is used for interactive data science, from data preparation, to running ML experiments, and all the way to deployment of ML applications. You will work with the open source community to accelerate Apache Spark with GPU. You will apply the latest ML/AI methods to empower enterprises to migrate Spark workloads onto GPUs at scale. Come join NVIDIA to apply data science to help us grow the adoption of GPU accelerated Spark.

Responsibilities

Develop ML models to predict the performance of GPU accelerated Apache Spark on existing workloads.
Develop ML models to tune GPU accelerated Apache Spark configurations to optimize performance on specific workloads.
Work on systems that continuously adapt and improve the aforementioned ML models.
Work on ML/AI agents that can help fix and optimize GPU accelerated Apache Spark applications.
Work on new functionality for GPU accelerated Apache Spark to facilitate large scale ML model training and inference.
Create examples showcasing how to best use GPU accelerated Apache Spark and Spark MLlib to carry out large scale ML and DL training and inference.
Work with NVIDIA partners and customers on deploying GPU accelerated Spark ML algorithms in cloud or on-premise.
Keep up with published advances in ML systems and algorithms.
Provide technical mentorship in data science and ML to a team of engineers.

Requirements

BS, MS, or PhD in Data Science, Statistics, Computer Science, Computer Engineering, or closely related field (or equivalent experience).
12+ years of work or research experience, with 5+ years as technical lead, in ML model development.
2+ years of hands-on experience with Apache Spark.
Proven technical skills in crafting, implementing, and productionizing high-quality ML solutions.
Proven ability to use modern techniques and tools for all aspects of ML model development, deployment, and maintenance.
Excellent programming skills in Python and Python data science related libraries like numpy, pandas, scikit-learn, scipy, pytorch, and tensorflow.
Experience developing boosted tree model based solutions, using libraries like XGBoost.
Background in developing LLM/GenAI based solutions.
Experience in feature engineering and feature importance assessment.
Familiar with agile software development practice.

Benefits

The base salary range is 272,000 USD - 419,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. You will also be eligible for equity and benefits.