Skip to main content

3.1b Scalable Machine Learning

Source repo: sdsc-summer-institute-2021 | Branch: main | Last synced: 2026-04-24 10:27:17.425 UTC

3.1b. Scalable Machine Learning

  • Mai Nguyen, Lead for Data Analytics, SDSC
  • Paul Rodriguez, Research Analyst, SDSC

Machine learning is an integral part of knowledge discovery in a wide variety of applications. From scientific domains to social media analytics, the data that needs to be analyzed has become massive and complex. This session introduces approaches that can be used to perform machine learning at scale. Tools and procedures for executing machine learning techniques on HPC will be presented. Spark will also be covered. In particular, we will use Spark’s machine learning library, MLlib, to demonstrate how distributed computing can be used to provide scalable machine learning. Please note: Knowledge of fundamental machine learning algorithms and techniques is required. (See description for Machine Learning Overview.)