Source Information¶


Author: Bob Sinkovits

Last Updated Date: October 01, 2024

Resources: https://github.com/sinkovit/PythonSeries


Goal¶

This Jupyter notebook demonstrates how to use the Dask library for parallel processing, specifically focusing on visualizing the task graphs (DAGs) that Dask creates to efficiently manage dependencies and computation on chunked data.

Dask Graphs¶

The key element of dask is the scheduler which builds a Direct Acyclic Graph of all the operations to be executed on each chunk of data to compute the final result.

The DAG is the key feature that allows dask to understand the dependency graph between all the steps in a set of computations and parallelize accordingly.

Required Modules for the Jupyter Notebook¶

Before running the notebook, make sure to load the following modules.

Module: dask

In [1]:
import dask.array as da
/home/gseo/.local/lib/python3.8/site-packages/cupy/_environment.py:447: UserWarning: 
--------------------------------------------------------------------------------

  CuPy may not function correctly because multiple CuPy packages are installed
  in your environment:

    cupy-cuda110, cupy-cuda11x

  Follow these steps to resolve this issue:

    1. For all packages listed above, run the following command to remove all
       existing CuPy installations:

         $ pip uninstall <package_name>

      If you previously installed CuPy via conda, also run the following:

         $ conda uninstall cupy

    2. Install the appropriate CuPy package.
       Refer to the Installation Guide for detailed instructions.

         https://docs.cupy.dev/en/stable/install.html

--------------------------------------------------------------------------------

  warnings.warn(f'''

Creating a Dask Array¶

In [2]:
x = da.ones(15, chunks=(5,))

The following command creates a Dask array with 15 elements, divided into chunks of 5. As a result, the array will be split into 3 chunks. Each chunk will be processed independently, allowing for efficient parallel computation.

visualize() calls graphviz to create a graphical representation of the graph

Visualizing the Computation Graph:¶

In [3]:
x.visualize()
Out[3]:
No description has been provided for this image

Then, lets create a new Dask array by adding 1 to each element of the Dask array x.

In [4]:
(x+1).visualize()
Out[4]:
No description has been provided for this image

After adding 1 to each element of x, the sum() method is called to compute the sum of all elements in the resulting Dask array.

In [5]:
(x+1).sum().visualize()
Out[5]:
No description has been provided for this image

Let's try with a more complex example.

In [6]:
m = da.ones((15, 15), chunks=(5,5))
In [7]:
(m.T + 1).visualize()
Out[7]:
No description has been provided for this image
In [8]:
(m.T + m).visualize()
Out[8]:
No description has been provided for this image
In [9]:
(m.dot(m.T + 1) - m.mean(axis=0)).visualize()
Out[9]:
No description has been provided for this image
In [10]:
(m.dot(m.T + 1) - m.mean(axis=0)).compute()
Out[10]:
array([[29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.],
       [29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29., 29.,
        29., 29.]])

Submit Ticket¶

If you find anything that needs to be changed, edited, or if you would like to provide feedback or contribute to the notebook, please submit a ticket by contacting us at:

Email: consult@sdsc.edu

We appreciate your input and will review your suggestions promptly!

In [ ]: