Skip to main content

Source repo: ciml-summer-institute-2026 | Branch: main | Last synced: 2026-04-24 10:27:17.425 UTC

DRAFT: CIML AGENDA

Source: https://na.eventscloud.com/website/92687/agenda-ciml/


ciml-summer-institute-2025

Repository for the CIML 2025 Summer Institute training materials.

Website: https://na.eventscloud.com/website/83697/home-ciml/

Interactive Videos

  • A full catalog of all our trainings at SDSC can be found here.

Contents:

Description:

This repository contains all the presentations and training material used for the CIML Summer Institute. To work with the material, remember to CLONE this repo.


Agenda:

All times are in Pacific time.

##Agenda

Agenda is subject to change. Times listed below are in Pacific.

###Tuesday, June 16 - Preparation Day (virtual)

TimeTopic TitleSpeaker(s)
9:00 am - 9:30 am1.1. Welcome & OrientationCindy Wong, Events Specialist; Mary Thomas, Computational Data Scientist & Director of the CIML Summer Institute
9:30 am – 10:00 am1.2 Accounts, Login, Environment, Running Jobs and Logging into Expanse User PortalMary Thomas, Computational Data Scientist & Director of the CIML Summer Institute

Tuesday, June 25 - HPC/Parallel Concepts (in person)

TimeTopic TitleSpeaker(s)
8:00 - 8:30 amLight Breakfast & Check-in
8:30 - 9:30 am2.1 Welcome and IntroductionsMary Thomas, Computational Data Scientist & Director of the CIML Summer Institute
9:30 am - 9:45 amBreak
9:45 am - 10:45 am2.2 Parallel Computing Concepts We will cover supercomputer architectures, the differences between threads and processes, implementations of parallelism (e.g., OpenMP and MPI), strong and weak scaling, limitations on scalability (Amdahl's and Gustafson's Laws) and benchmarking.Mary Thomas, Computational Data Scientist & Director of the CIML Summer Institute
10:45 am - 11:45 am2.3 Getting Started with Batch Job Scheduling Batch job schedulers are used to manage and fairly distribute the shared resources of high-performance computing (HPC) systems. Learning how to interact with them and compose your work into batch jobs is essential to becoming an effective HPC user.Marty Kandes, Computational and Data Science Research Specialist
11:45 am - 1:00 pmLunch Break
1:00 pm - 2:15 pm2.4 Data Management and File Systems Managing data efficiently on a supercomputer is important from both users' and system's perspectives. We will cover a few basic data management techniques and I/O best practices in the context of the Expanse system at SDSC.Marty Kandes, Computational and Data Science Research Specialist
2:15 pm - 3:45 pm2.5 GPU Computing - Hardware architecture and software infrastructure Brief overview of the massively parallel GPU architecture that enables large-scale deep learning applications, access and use of GPUs on SDSC Expanse for ML applicationsAndreas Goetz, Research Scientist & Principal Investigator
3:45 pm - 4:00 pmBreak
4:00 pm - 5:30 pm2.6 Software Containers for Scientific and High-Performance Computing Singularity is an open-source container engine designed to bring operating system-level virtualization to scientific and high-performance computing. With Singularity you can package complex computational workflows --- software applications, libraries, and data --- in a simple, portable, and reproducible way, which can then be run almost anywhere.Marty Kandes, Computational and Data Science Research Specialist
5:30 PM – 5:45 PMQ&A, Wrap-up

Wednesday, June 26 - Deep Learning (in person)

TimeTopic TitleSpeaker(s)
8:00 am - 8:30 amLight Breakfast
8:30 am - 10:00 am3.1 Introduction to Neural Networks and Convolution Neural Networks An overview of the main concepts of neural networks and feature discovery; the basic convolution neural network for digit recognitionPaul Rodriguez, Computational Data Scientist
10:00 am - 10:15 amBreak
10:15 am - 11:30 am3.2 Practical Guidelines for Training Deep Learning on HPC Guidelines on running deep networks on Expanse, such as using notebooks, and batch jobs; also some discussion of multinode execution.Paul Rodriguez, Computational Data Scientist
11:30 am - 12:00 pm3.3 Experiment Tracking We will cover tools for tracking and organizing ML and DL experiments.Mai Nguyen, Lead for Data Analytics
12:00 pm - 1:30 pmLunch & Group Photo
1:30 pm - 2:15 pm3.4 Deep Learning Layers and Architectures Overview of deep learning concepts, including layers, architectures, applications, and libraries.Mai Nguyen, Lead for Data Analytics
2:15 pm - 3:45 pm3.5 Deep Learning Transfer Learning Tutorial and hands-on exercises on the use of transfer learning and fine-tuning for efficient training of deep learning models.Mai Nguyen, Lead for Data Analytics
3:45 pm - 4:00 pmBreak
4:00 pm - 5:30 pm3.6 Deep Learning – Special Connections and Transformers The architecture of many networks use paths and connections in flexible ways; we will review gate, skip, and residual connections and get some intuition about transformers.Paul Rodriguez, Computational Data Scientist

###Thursday, June 27 - Scalable ML & Large Language Models (in person)

TimeTopic TitleSpeaker(s)
8:00 am - 8:30 amLight Breakfast
8:30 am– 10:00 am4.1 CONDA Environments and Jupyter Notebook on Expanse Set up reproducible and transferable software environments and scale up calculations to large datasets using parallel computing.Marty Kandes, Computational and Data Science Research Specialist
10:00 am – 10:15 amBreak
10:15 am - 11:00 am4.2 Spark Introduction to performing machine learning at scale, with hands-on exercises using Spark.Mai Nguyen, Lead for Data Analytics
11:00 am - 11:45 am4.3 Tools for Scaling Overview of tools to distribute and parallelize processes.Paul Rodriguez, Computational Data Scientist
11:45 am - 12:15 pmSDSC Data Center Tour
12:15 pm - 1:45 pmLunch
1:45 pm - 3:00 pm4.4 LLM Overview Introduction to Large Language Models and how they can be used to support research.Mai Nguyen, Paul Rodriguez, Mary Thomas
3:00 pm - 3:15 pmBreak
3:15 pm - 4:45 pm4.4 LLM Overview - ContinuedMai Nguyen and Paul Rodriguez
4:45 pm - 5:15 pmNAIRR IntroductionMary Thomas
5:15 pm - 5:30 pmClosing RemarksMary Thomas, Computational Data Scientist & Director of the CIML Summer Institute

CIML Instructors

NameTitleEmail
Andreas Goetz, Ph.D.Research Scientist, Principal Investigatorawgoetz at ucsd.edu
Mai Nguyen Ph.D.Lead for Data Analyticsmhnguyen at ucsd.edu
Marty Kandes Ph.D.Computational and Data Science Research Specialistmkandes at ucsd.edu
Mary Thomas Ph.D.Director of CIML, Computational Data Scientist, HPC Trainermpthomas at ucsd.edu
Paul Rodriguez Ph.D.Computational Data Scientistp4rodriguez at ucsd.edu
Robert Sinkovits Ph.D.Expanse co-PI & Project Managerrssinkovits at ucsd.edu

Contact Us: For inquiries, feel free to contact events@sdsc.edu. Would you like me to create a summary of the key speakers or draft an email invitation based on this agenda?