4.1a GPU Computing And Programming

Source repo: sdsc-summer-institute-2021 | Branch: main | Last synced: 2026-04-24 10:27:17.425 UTC

4.1a. GPU Computing and Programming

Andreas Goetz, Research Scientist and Principal Investigator, SDSC (agoetz@sdsc.edu)

This directory contains the slides and exercises for the SDSC 2021 Summer Institute online workshop on GPU computing and programming.

This session introduces massively parallel computing with graphics processing units (GPUs). Over the last decade the use of GPUs has become increasingly popular across all scientific domains since GPUs can significantly accelerate time to solution for many computational tasks. Participants will be introduced to essential background of the GPU chip architecture and will learn how to program GPUs via the use of libraries, OpenACC compiler directives, and the CUDA programming language. The session will incorporate hands-on exercises for participants to acquire the foundational skills to use and develop GPU aware applications.

Accessing and using GPU nodes on SDSC Expanse

This information has been covered in various places but is repeated here for convenience.

Obtain interactive shared GPU node on SDSC Expanse

Your .bashrc file should contain an alias get-gpu that will give you access to a single GPU on SDSC Expanse shared GPU nodes for 2h.

# Execute the following command                                                getgpu

This will launch following command, which you could also type instead:

srun --reservation=SI2021RES --partition gpu-shared --qos=gpu-shared-si2021 \
     --nodes=1 --ntasks-per-node=1 --cpus-per-task=10 --mem=90G --gpus=1 \
     --time=2:00:00 --pty --wait 0 /bin/bash

After a short while you should get logged into a GPU node with Nvidia V100 GPUs. Each node has 40 CPU cores and 4 GPUs. You can use up to 10 CPU cores and a single GPU.

Load CUDA and PGI compiler modules

In order to use the CUDA tool chain and the PGI compilers, you have to load the corresponding modules. Here we will load the CUDA 10.2 Toolkit and the PGI compiler.

module purge
module reset
module load cuda10.2/toolkit
module load pgi

Check the Nvidia CUDA compiler version:

user@expanse:~>nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

Check the PGI compiler version:

user@expanse:~> pgcc --version 

pgcc (aka pgcc18) 20.4-0 LLVM 64-bit target on x86-64 Linux -tp skylake 
PGI Compilers and Tools
Copyright (c) 2020, NVIDIA CORPORATION.  All rights reserved.

Check installed GPUs with NVIDIA System Management Interface (nvidia-smi)

The Nvidia system management interface (nvidia-smi) can be used to gather information about the available GPUs. It will also show any currently running jobs on GPUs.

# Execute the following command to get information about GPUs in the system
user@expanse:~> nvidia-smi

Check visible devices (should be set to free GPU)

user@comet:~> echo $CUDA_VISIBLE_DEVICES

This environment variable should be set by the queuing system to the ID(s) of the free GPU. Do not change it.

NVIDIA CUDA Toolkit code samples

The CUDA Toolkit comes with a set of code samples. It is a good idea to take a look at these code samples as they are a very instructive resource. Much can be learned by running the samples and inspecting the source code. Some samples are also useful tools (e.g. deviceQuery).

Copy the CUDA code samples into the current directory:

cp /cm/shared/apps/cuda-latest/sdk/current/ CUDA-samples

Compile the samples:

cd CUDA-sample
make -j 10

Or compile only samples of interest, e.g. deviceQuery:

cd 1_Utilities/deviceQuery
make

Run deviceQuery to query information on available GPUs

cd 1_Utilities/deviceQuery/
./deviceQuery
...
... lots of information on available GPUs will be printed
...

Simple code samples accompanying slides

See directory cuda-samples for CUDA sample codes. Compile with

nvcc example.cu -o example.x

See directory openacc-samples for OpenACC sample codes. Compile with

pgcc example.c -o example.x -acc -Minfo=accel

Read the README.md files for additional information. In particular it is instructive to compile the laplace-2d example OpenMP version and the OpenACC version and compare timings on multiple CPU cores (e.g. 1 to 10 CPU cores) to timing on the GPU (V100 on Expanse nodes).