Source Information¶


Created by: Bob Sinkovits

Updated by: October 25, 2024 by Gloria Seo

Resources: https://github.com/sinkovit/PythonSeries


Goal¶

This Jupyter notebook demonstrates how to use the Dask library for parallel processing, specifically focusing on visualizing the task graphs (DAGs) that Dask creates to efficiently manage dependencies and computation on chunked data.

Dask Graphs¶

The key element of dask is the scheduler which builds a Direct Acyclic Graph of all the operations to be executed on each chunk of data to compute the final result.

The DAG is the key feature that allows dask to understand the dependency graph between all the steps in a set of computations and parallelize accordingly.

Required Modules for the Jupyter Notebook¶

Before running the notebook, make sure to load the following modules.

Module: dask, cupy, graphviz

In [1]:
import dask.array as da
import cupy as cp 
from dask import array as da
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-b212f55aa777> in <module>
      1 import dask.array as da
----> 2 import cupy as cp
      3 from dask import array as da

ModuleNotFoundError: No module named 'cupy'

As I needed to install the Graphviz package, I have installed it using the pip command.

In [2]:
pip install graphviz
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: graphviz in /home/amehrotra1/.local/lib/python3.8/site-packages (0.20.3)
Note: you may need to restart the kernel to use updated packages.
In [3]:
import graphviz
In [4]:
pip install cupy
Defaulting to user installation because normal site-packages is not writeable
Collecting cupy
  Downloading cupy-12.3.0.tar.gz (1.8 MB)
     |████████████████████████████████| 1.8 MB 16.2 MB/s eta 0:00:01
Collecting numpy<1.29,>=1.20
  Downloading numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
     |████████████████████████████████| 17.3 MB 131.3 MB/s eta 0:00:01
Collecting fastrlock>=0.5
  Using cached fastrlock-0.8.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_24_x86_64.whl (51 kB)
Building wheels for collected packages: cupy
  Building wheel for cupy (setup.py) ... error
  ERROR: Command errored out with exit status 1:
   command: /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"'; __file__='"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /scratch/amehrotra1/job_38027192/pip-wheel-gldkog6o
       cwd: /scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/
  Complete output (62 lines):
  Clearing directory: /scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/.data
  
  -------- Configuring Module: cuda --------
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  /scratch/amehrotra1/job_38027192/tmpccbcj891/a.cpp:1:10: fatal error: cublas_v2.h: No such file or directory
   #include <cublas_v2.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  command 'gcc' failed with exit status 1
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  /scratch/amehrotra1/job_38027192/tmp18yn909f/a.cpp:2:18: fatal error: cuda_runtime_api.h: No such file or directory
           #include <cuda_runtime_api.h>
                    ^~~~~~~~~~~~~~~~~~~~
  compilation terminated.
  **************************************************
  *** WARNING: Cannot check compute capability
  command 'gcc' failed with exit status 1
  **************************************************
  
  ************************************************************
  * CuPy Configuration Summary                               *
  ************************************************************
  
  Build Environment:
    Include directories: ['/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/_core/include/cupy/cub', '/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/_core/include']
    Library directories: []
    nvcc command       : (not found)
    hipcc command      : (not found)
  
  Environment Variables:
    CFLAGS          : (none)
    LDFLAGS         : (none)
    LIBRARY_PATH    : /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/lib:/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64
    CUDA_PATH       : (none)
    NVCC            : (none)
    HIPCC           : (none)
    ROCM_HOME       : (none)
  
  Modules:
    cuda      : No
      -> Include files not found: ['cublas_v2.h', 'cuda.h', 'cuda_profiler_api.h', 'cuda_runtime.h', 'cufft.h', 'curand.h', 'cusparse.h', 'nvrtc.h']
      -> Check your CFLAGS environment variable.
  
  ERROR: CUDA could not be found on your system.
  
  HINT: You are trying to build CuPy from source, which is NOT recommended for general use.
        Please consider using binary packages instead.
  
  Please refer to the Installation Guide for details:
  https://docs.cupy.dev/en/stable/install.html
  
  ************************************************************
  
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py", line 88, in <module>
      ext_modules = cupy_setup_build.get_ext_modules(True, ctx)
    File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/install/cupy_builder/cupy_setup_build.py", line 449, in get_ext_modules
      extensions = make_extensions(ctx, compiler, use_cython)
    File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/install/cupy_builder/cupy_setup_build.py", line 305, in make_extensions
      raise Exception('Your CUDA environment is invalid. '
  Exception: Your CUDA environment is invalid. Please check above error log.
  ----------------------------------------
  ERROR: Failed building wheel for cupy
  Running setup.py clean for cupy
  ERROR: Command errored out with exit status 1:
   command: /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"'; __file__='"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' clean --all
       cwd: /scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy
  Complete output (62 lines):
  Clearing directory: /scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/.data
  
  -------- Configuring Module: cuda --------
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  /scratch/amehrotra1/job_38027192/tmpuwhh9o4i/a.cpp:1:10: fatal error: cublas_v2.h: No such file or directory
   #include <cublas_v2.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  command 'gcc' failed with exit status 1
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  /scratch/amehrotra1/job_38027192/tmp3m3y9tvl/a.cpp:2:18: fatal error: cuda_runtime_api.h: No such file or directory
           #include <cuda_runtime_api.h>
                    ^~~~~~~~~~~~~~~~~~~~
  compilation terminated.
  **************************************************
  *** WARNING: Cannot check compute capability
  command 'gcc' failed with exit status 1
  **************************************************
  
  ************************************************************
  * CuPy Configuration Summary                               *
  ************************************************************
  
  Build Environment:
    Include directories: ['/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/_core/include/cupy/cub', '/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/_core/include']
    Library directories: []
    nvcc command       : (not found)
    hipcc command      : (not found)
  
  Environment Variables:
    CFLAGS          : (none)
    LDFLAGS         : (none)
    LIBRARY_PATH    : /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/lib:/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64
    CUDA_PATH       : (none)
    NVCC            : (none)
    HIPCC           : (none)
    ROCM_HOME       : (none)
  
  Modules:
    cuda      : No
      -> Include files not found: ['cublas_v2.h', 'cuda.h', 'cuda_profiler_api.h', 'cuda_runtime.h', 'cufft.h', 'curand.h', 'cusparse.h', 'nvrtc.h']
      -> Check your CFLAGS environment variable.
  
  ERROR: CUDA could not be found on your system.
  
  HINT: You are trying to build CuPy from source, which is NOT recommended for general use.
        Please consider using binary packages instead.
  
  Please refer to the Installation Guide for details:
  https://docs.cupy.dev/en/stable/install.html
  
  ************************************************************
  
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py", line 88, in <module>
      ext_modules = cupy_setup_build.get_ext_modules(True, ctx)
    File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/install/cupy_builder/cupy_setup_build.py", line 449, in get_ext_modules
      extensions = make_extensions(ctx, compiler, use_cython)
    File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/install/cupy_builder/cupy_setup_build.py", line 305, in make_extensions
      raise Exception('Your CUDA environment is invalid. '
  Exception: Your CUDA environment is invalid. Please check above error log.
  ----------------------------------------
  ERROR: Failed cleaning build dir for cupy
Failed to build cupy
Installing collected packages: numpy, fastrlock, cupy
    Running setup.py install for cupy ... error
    ERROR: Command errored out with exit status 1:
     command: /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"'; __file__='"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /scratch/amehrotra1/job_38027192/pip-record-xjmcbx0r/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/amehrotra1/.local/include/python3.8/cupy
         cwd: /scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/
    Complete output (62 lines):
    Clearing directory: /scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/.data
    
    -------- Configuring Module: cuda --------
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /scratch/amehrotra1/job_38027192/tmpxdvxrfo4/a.cpp:1:10: fatal error: cublas_v2.h: No such file or directory
     #include <cublas_v2.h>
              ^~~~~~~~~~~~~
    compilation terminated.
    command 'gcc' failed with exit status 1
    cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    /scratch/amehrotra1/job_38027192/tmp95wwpbzx/a.cpp:2:18: fatal error: cuda_runtime_api.h: No such file or directory
             #include <cuda_runtime_api.h>
                      ^~~~~~~~~~~~~~~~~~~~
    compilation terminated.
    **************************************************
    *** WARNING: Cannot check compute capability
    command 'gcc' failed with exit status 1
    **************************************************
    
    ************************************************************
    * CuPy Configuration Summary                               *
    ************************************************************
    
    Build Environment:
      Include directories: ['/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/_core/include/cupy/cub', '/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/cupy/_core/include']
      Library directories: []
      nvcc command       : (not found)
      hipcc command      : (not found)
    
    Environment Variables:
      CFLAGS          : (none)
      LDFLAGS         : (none)
      LIBRARY_PATH    : /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/lib:/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64
      CUDA_PATH       : (none)
      NVCC            : (none)
      HIPCC           : (none)
      ROCM_HOME       : (none)
    
    Modules:
      cuda      : No
        -> Include files not found: ['cublas_v2.h', 'cuda.h', 'cuda_profiler_api.h', 'cuda_runtime.h', 'cufft.h', 'curand.h', 'cusparse.h', 'nvrtc.h']
        -> Check your CFLAGS environment variable.
    
    ERROR: CUDA could not be found on your system.
    
    HINT: You are trying to build CuPy from source, which is NOT recommended for general use.
          Please consider using binary packages instead.
    
    Please refer to the Installation Guide for details:
    https://docs.cupy.dev/en/stable/install.html
    
    ************************************************************
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py", line 88, in <module>
        ext_modules = cupy_setup_build.get_ext_modules(True, ctx)
      File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/install/cupy_builder/cupy_setup_build.py", line 449, in get_ext_modules
        extensions = make_extensions(ctx, compiler, use_cython)
      File "/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/install/cupy_builder/cupy_setup_build.py", line 305, in make_extensions
        raise Exception('Your CUDA environment is invalid. '
    Exception: Your CUDA environment is invalid. Please check above error log.
    ----------------------------------------
ERROR: Command errored out with exit status 1: /cm/shared/apps/spack/cpu/opt/spack/linux-centos8-zen/gcc-8.3.1/anaconda3-2020.11-da3i7hmt6bdqbmuzq6pyt7kbm47wyrjp/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"'; __file__='"'"'/scratch/amehrotra1/job_38027192/pip-install-azgnxc31/cupy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /scratch/amehrotra1/job_38027192/pip-record-xjmcbx0r/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/amehrotra1/.local/include/python3.8/cupy Check the logs for full command output.
Note: you may need to restart the kernel to use updated packages.

Creating a Dask Array¶

In [ ]:
x = da.from_array(cp.ones(15), chunks=(5,))

This command creates a Dask array with 15 elements divided into 3 chunks of size 5. Each chunk can be processed in parallel for efficient computation.

The cp.ones(15) creates a CuPy array filled with ones on the GPU, allowing Dask to leverage GPU memory for enhanced performance.

Visualizing the Computation Graph:¶

visualize() calls graphviz to create a graphical representation of the graph

In [ ]:
x.visualize()

Then, lets create a new Dask array by adding 1 to each element of the Dask array x.

In [ ]:
(x+1).visualize()

After adding 1 to each element of x, the sum() method is called to compute the sum of all elements in the resulting Dask array.

In [ ]:
(x+1).sum().visualize()

Let's try with a more complex example.

In [ ]:
m = da.ones((15, 15), chunks=(5,5))
In [ ]:
(m.T + 1).visualize()
In [ ]:
(m.T + m).visualize()
In [ ]:
(m.dot(m.T + 1) - m.mean(axis=0)).visualize()
In [ ]:
(m.dot(m.T + 1) - m.mean(axis=0)).compute()

Submit Ticket¶

If you find anything that needs to be changed, edited, or if you would like to provide feedback or contribute to the notebook, please submit a ticket by contacting us at:

Email: consult@sdsc.edu

We appreciate your input and will review your suggestions promptly!