Source repo: sdsc-summer-institute-2023 | Branch:
main| Last synced: 2026-04-24 10:27:17.425 UTC
OpenACC examples
SDSC Summer Institute 2023
Material by Andreas Goetz (agoetz@sdsc.edu) partly based on work by Mark Harris - NVIDIA corporation.
Directory content as explained below.
Vector addition (saxpy)
See subdirectory ./saxpy.
saxpy.c
C program that performs single precision vector addition (y = a*x + y).
Can be compiled with standard C compiler or PGI pgcc compiler with
accelerator directives.
pgcc saxpy.c -o saxpy-cpu.x pgcc saxpy.c -acc -Minfo=accel -o saxpy-gpu.x
Jacobi solver of 2D Laplace equation
See subdirectory ./laplace-2d.
C and Fortran versions of a Jacobi solver for 2D Laplace equation including OpenMP and OpenACC versions. See below how to compile and benchmarks on SDSC Comet and a pre-Expanse testbed system.
Benchmarks on SDSC Comet node, using K80 GPU
module load pgi
pgf90 jacobi.f90 -fast -o jacobi-pgf90.x ./jacobi-pgf90.x # completes in 50 seconds
pgcc jacobi.c -fast -o jacobi-pgcc.x ./jacobi-pgcc.x # completes in 71 seconds
pgf90 jacobi-omp.f90 -fast -mp -Minfo=mp -o jacobi-pgf90-omp.x export OMP_NUM_THREADS=2 ./jacobi-pgf90-omp.x # completes in 33 seconds export OMP_NUM_THREADS=4 ./jacobi-pgf90-omp.x # completes in 25 seconds export OMP_NUM_THREADS=6 ./jacobi-pgf90-omp.x # completes in 24 seconds export OMP_NUM_THREADS=12 ./jacobi-pgf90-omp.x # completes in 23 seconds export OMP_NUM_THREADS=24 ./jacobi-pgf90-omp.x # completes in 22 seconds
pgcc jacobi-omp.c -fast -mp -Minfo=mp -o jacobi-pgfcc-omp.x export OMP_NUM_THREADS=2 ./jacobi-pgcc-omp.x # completes in 41 seconds export OMP_NUM_THREADS=4 ./jacobi-pgcc-omp.x # completes in 26 seconds export OMP_NUM_THREADS=6 ./jacobi-pgcc-omp.x # completes in 24 seconds export OMP_NUM_THREADS=12 ./jacobi-pgcc-omp.x # completes in 23 seconds export OMP_NUM_THREADS=24 ./jacobi-pgcc-omp.x # completes in 23 seconds
pgf90 jacobi-acc.f90 -acc -Minfo=accel -o jacobi-pgf90-acc.x ./jacobi-pgf90-acc.x # completes in 5 seconds
pgcc jacobi-acc.c -acc -Minfo=accel -o jacobi-pgcc-acc.x ./jacobi-pgcc-acc.x # completes in 6 seconds
Benchmarks on SDSC V100 test node
Hardware:
- 2 x 16-core Intel Xeon Gold 6130 CPU @ 2.10 GHz
- Nvidia V100-SXM2 GPU
Compile and benchmark:
pgicc --version pgcc 18.10-1 64-bit target on x86-64 Linux -tp skylake
pgf90 jacobi.f90 -fast -o jacobi-pgf90.x ./jacobi-pgf90.x # completes in 44.3 seconds
pgcc jacobi.c -fast -o jacobi-pgcc.x ./jacobi-pgcc.x # completes in 48.2 seconds
pgf90 jacobi-omp.f90 -fast -mp -Minfo=mp -o jacobi-pgf90-omp.x export OMP_NUM_THREADS=2 ./jacobi-pgf90-omp.x # completes in 24 seconds export OMP_NUM_THREADS=4 ./jacobi-pgf90-omp.x # completes in 13 seconds export OMP_NUM_THREADS=8 ./jacobi-pgf90-omp.x # completes in 8 seconds export OMP_NUM_THREADS=16 ./jacobi-pgf90-omp.x # completes in 6 seconds export OMP_NUM_THREADS=32 ./jacobi-pgf90-omp.x # completes in 6 seconds
pgcc jacobi-omp.c -fast -mp -Minfo=mp -o jacobi-pgfcc-omp.x export OMP_NUM_THREADS=2 ./jacobi-pgcc-omp.x # completes in 25 seconds export OMP_NUM_THREADS=4 ./jacobi-pgcc-omp.x # completes in 13 seconds export OMP_NUM_THREADS=6 ./jacobi-pgcc-omp.x # completes in 8 seconds export OMP_NUM_THREADS=12 ./jacobi-pgcc-omp.x # completes in 6 seconds export OMP_NUM_THREADS=24 ./jacobi-pgcc-omp.x # completes in 6 seconds
pgf90 jacobi-acc.f90 -acc -Minfo=accel -o jacobi-pgf90-acc.x ./jacobi-pgf90-acc.x # completes in 1.3 seconds
pgcc jacobi-acc.c -acc -Minfo=accel -o jacobi-pgcc-acc.x ./jacobi-pgcc-acc.x # completes in 1.3 seconds