Source Information¶
Created by: Bob Sinkovits
Updated by: October 25, 2024 by Gloria Seo
Resources: https://github.com/sinkovit/PythonSeries
Goal¶
This Jupyter notebook demonstrates how to use the Dask library for parallel processing, specifically focusing on visualizing the task graphs (DAGs) that Dask creates to efficiently manage dependencies and computation on chunked data.
Dask Graphs¶
The key element of dask is the scheduler which builds a Direct Acyclic Graph of all the operations to be executed on each chunk of data to compute the final result.
The DAG is the key feature that allows dask to understand the dependency graph between all the steps in a set of computations and parallelize accordingly.
Required Modules for the Jupyter Notebook¶
Before running the notebook, make sure to load the following modules.
Module: dask, cupy, graphviz
import dask.array as da
import cupy as cp
from dask import array as da
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) <ipython-input-1-b212f55aa777> in <module> 1 import dask.array as da ----> 2 import cupy as cp 3 from dask import array as da ModuleNotFoundError: No module named 'cupy'
As I needed to install the Graphviz package, I have installed it using the pip command.
pip install graphviz
Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: graphviz in /home/amehrotra1/.local/lib/python3.8/site-packages (0.20.3) Note: you may need to restart the kernel to use updated packages.
import graphviz
pip install cupy
Defaulting to user installation because normal site-packages is not writeable
Collecting cupy
Downloading cupy-12.3.0.tar.gz (1.8 MB)
|████████████████████████████████| 1.8 MB 16.2 MB/s eta 0:00:01
Collecting numpy<1.29,>=1.20
Downloading numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
|████████████████████████████████| 17.3 MB 131.3 MB/s eta 0:00:01
Collecting fastrlock>=0.5
Using cached fastrlock-0.8.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_24_x86_64.whl (51 kB)
Building wheels for collected packages: cupy
Building wheel for cupy (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"'; __file__='"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /scratch/amehrotra1/job_38027192/pip-wheel-gldkog6o
cwd: /scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/
Complete output (62 lines):
Clearing directory: /scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/.data
-------- Configuring Module: cuda --------
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/scratch/amehrotra1/job_38027192/tmpccbcj891/a.cpp:1:10: fatal error: cublas_v2.h: No such file or directory
#include <cublas_v2.h>
^~~~~~~~~~~~~
compilation terminated.
command 'gcc' failed with exit status 1
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/scratch/amehrotra1/job_38027192/tmp18yn909f/a.cpp:2:18: fatal error: cuda_runtime_api.h: No such file or directory
#include <cuda_runtime_api.h>
^~~~~~~~~~~~~~~~~~~~
compilation terminated.
**************************************************
*** WARNING: Cannot check compute capability
command 'gcc' failed with exit status 1
**************************************************
************************************************************
* CuPy Configuration Summary *
************************************************************
Build Environment:
Include directories: ['/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/_core/include/cupy/cub', '/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/_core/include']
Library directories: []
nvcc command : (not found)
hipcc command : (not found)
Environment Variables:
CFLAGS : (none)
LDFLAGS : (none)
LIBRARY_PATH : /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/lib:/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64
CUDA_PATH : (none)
NVCC : (none)
HIPCC : (none)
ROCM_HOME : (none)
Modules:
cuda : No
-> Include files not found: ['cublas_v2.h', 'cuda.h', 'cuda_profiler_api.h', 'cuda_runtime.h', 'cufft.h', 'curand.h', 'cusparse.h', 'nvrtc.h']
-> Check your CFLAGS environment variable.
ERROR: CUDA could not be found on your system.
HINT: You are trying to build CuPy from source, which is NOT recommended for general use.
Please consider using binary packages instead.
Please refer to the Installation Guide for details:
https://docs.cupy.dev/en/stable/install.html
************************************************************
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py", line 88, in <module>
ext_modules = cupy_setup_build.get_ext_modules(True, ctx)
File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/install/cupy_builder/cupy_setup_build.py", line 449, in get_ext_modules
extensions = make_extensions(ctx, compiler, use_cython)
File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/install/cupy_builder/cupy_setup_build.py", line 305, in make_extensions
raise Exception('Your CUDA environment is invalid. '
Exception: Your CUDA environment is invalid. Please check above error log.
----------------------------------------
ERROR: Failed building wheel for cupy
Running setup.py clean for cupy
ERROR: Command errored out with exit status 1:
command: /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"'; __file__='"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' clean --all
cwd: /scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy
Complete output (62 lines):
Clearing directory: /scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/.data
-------- Configuring Module: cuda --------
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/scratch/amehrotra1/job_38027192/tmpuwhh9o4i/a.cpp:1:10: fatal error: cublas_v2.h: No such file or directory
#include <cublas_v2.h>
^~~~~~~~~~~~~
compilation terminated.
command 'gcc' failed with exit status 1
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/scratch/amehrotra1/job_38027192/tmp3m3y9tvl/a.cpp:2:18: fatal error: cuda_runtime_api.h: No such file or directory
#include <cuda_runtime_api.h>
^~~~~~~~~~~~~~~~~~~~
compilation terminated.
**************************************************
*** WARNING: Cannot check compute capability
command 'gcc' failed with exit status 1
**************************************************
************************************************************
* CuPy Configuration Summary *
************************************************************
Build Environment:
Include directories: ['/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/_core/include/cupy/cub', '/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/_core/include']
Library directories: []
nvcc command : (not found)
hipcc command : (not found)
Environment Variables:
CFLAGS : (none)
LDFLAGS : (none)
LIBRARY_PATH : /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/lib:/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64
CUDA_PATH : (none)
NVCC : (none)
HIPCC : (none)
ROCM_HOME : (none)
Modules:
cuda : No
-> Include files not found: ['cublas_v2.h', 'cuda.h', 'cuda_profiler_api.h', 'cuda_runtime.h', 'cufft.h', 'curand.h', 'cusparse.h', 'nvrtc.h']
-> Check your CFLAGS environment variable.
ERROR: CUDA could not be found on your system.
HINT: You are trying to build CuPy from source, which is NOT recommended for general use.
Please consider using binary packages instead.
Please refer to the Installation Guide for details:
https://docs.cupy.dev/en/stable/install.html
************************************************************
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py", line 88, in <module>
ext_modules = cupy_setup_build.get_ext_modules(True, ctx)
File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/install/cupy_builder/cupy_setup_build.py", line 449, in get_ext_modules
extensions = make_extensions(ctx, compiler, use_cython)
File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/install/cupy_builder/cupy_setup_build.py", line 305, in make_extensions
raise Exception('Your CUDA environment is invalid. '
Exception: Your CUDA environment is invalid. Please check above error log.
----------------------------------------
ERROR: Failed cleaning build dir for cupy
Failed to build cupy
Installing collected packages: numpy, fastrlock, cupy
Running setup.py install for cupy ... error
ERROR: Command errored out with exit status 1:
command: /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"'; __file__='"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /scratch/amehrotra1/job_38027192/pip-record-xjmcbx0r/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/amehrotra1/.local/include/python3.8/cupy
cwd: /scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/
Complete output (62 lines):
Clearing directory: /scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/.data
-------- Configuring Module: cuda --------
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/scratch/amehrotra1/job_38027192/tmpxdvxrfo4/a.cpp:1:10: fatal error: cublas_v2.h: No such file or directory
#include <cublas_v2.h>
^~~~~~~~~~~~~
compilation terminated.
command 'gcc' failed with exit status 1
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/scratch/amehrotra1/job_38027192/tmp95wwpbzx/a.cpp:2:18: fatal error: cuda_runtime_api.h: No such file or directory
#include <cuda_runtime_api.h>
^~~~~~~~~~~~~~~~~~~~
compilation terminated.
**************************************************
*** WARNING: Cannot check compute capability
command 'gcc' failed with exit status 1
**************************************************
************************************************************
* CuPy Configuration Summary *
************************************************************
Build Environment:
Include directories: ['/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/_core/include/cupy/cub', '/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/_core/include']
Library directories: []
nvcc command : (not found)
hipcc command : (not found)
Environment Variables:
CFLAGS : (none)
LDFLAGS : (none)
LIBRARY_PATH : /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/lib:/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64
CUDA_PATH : (none)
NVCC : (none)
HIPCC : (none)
ROCM_HOME : (none)
Modules:
cuda : No
-> Include files not found: ['cublas_v2.h', 'cuda.h', 'cuda_profiler_api.h', 'cuda_runtime.h', 'cufft.h', 'curand.h', 'cusparse.h', 'nvrtc.h']
-> Check your CFLAGS environment variable.
ERROR: CUDA could not be found on your system.
HINT: You are trying to build CuPy from source, which is NOT recommended for general use.
Please consider using binary packages instead.
Please refer to the Installation Guide for details:
https://docs.cupy.dev/en/stable/install.html
************************************************************
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py", line 88, in <module>
ext_modules = cupy_setup_build.get_ext_modules(True, ctx)
File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/install/cupy_builder/cupy_setup_build.py", line 449, in get_ext_modules
extensions = make_extensions(ctx, compiler, use_cython)
File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/install/cupy_builder/cupy_setup_build.py", line 305, in make_extensions
raise Exception('Your CUDA environment is invalid. '
Exception: Your CUDA environment is invalid. Please check above error log.
----------------------------------------
ERROR: Command errored out with exit status 1: /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"'; __file__='"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /scratch/amehrotra1/job_38027192/pip-record-xjmcbx0r/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/amehrotra1/.local/include/python3.8/cupy Check the logs for full command output.
Note: you may need to restart the kernel to use updated packages.
Creating a Dask Array¶
x = da.from_array(cp.ones(15), chunks=(5,))
This command creates a Dask array with 15 elements divided into 3 chunks of size 5. Each chunk can be processed in parallel for efficient computation.
The cp.ones(15) creates a CuPy array filled with ones on the GPU, allowing Dask to leverage GPU memory for enhanced performance.
Visualizing the Computation Graph:¶
visualize() calls graphviz to create a graphical representation of the graph
x.visualize()
Then, lets create a new Dask array by adding 1 to each element of the Dask array x.
(x+1).visualize()
After adding 1 to each element of x, the sum() method is called to compute the sum of all elements in the resulting Dask array.
(x+1).sum().visualize()
Let's try with a more complex example.
m = da.ones((15, 15), chunks=(5,5))
(m.T + 1).visualize()
(m.T + m).visualize()
(m.dot(m.T + 1) - m.mean(axis=0)).visualize()
(m.dot(m.T + 1) - m.mean(axis=0)).compute()
Submit Ticket¶
If you find anything that needs to be changed, edited, or if you would like to provide feedback or contribute to the notebook, please submit a ticket by contacting us at:
Email: consult@sdsc.edu
We appreciate your input and will review your suggestions promptly!