Source Information¶


Created by:

Updated by: October 25, 2024 by Gloria Seo

Resources: https://matplotlib.org/


Goal¶

This notebook purpose is to learn how to use Matplotlib Python Package, which is a data visualization and graphical plotting library. We will be covering topics including basic syntax, scatter plots, line plots, bar charts, and histograms in the Notebook.

Matplotlib¶

There are multiple ways to do plotting and charting in Python. Here we'll focus on Matplotlib and several basic types of charts: scatter plots, line plots, bar plots and histograms. Matplotlib has three layers. A backend layer that renders a plot to a screen or file, an artist layer that defines containers and primitives (e.g. figures axes, subplots, line2D, rectangle) and a scripting layer that we use to interact with the backend and artist layers. For the scripting layer, we'll be using pyplot.

Note on syntax¶

Matplotlib provides two ways to generate figures. The more basic or elementary approach involves calling the matplotlib.pyplot.figure method, which creates a new figure.

The more advanced approach starts with a call to the matplotlib.pyplot.subplots method, which returns a Figure object and one or more Axes objects.

Throughout this tutorial we'll use the latter, with the more basic approach commented out for reference. In my opinion this is more intuitive, makes it easier to access the advanced features and eases the transition to generating complex figures containing more than one subgraph. Many of the questions answered on resources such as StackOverflow are based on the advanced interface.

Required Modules for the Jupyter Notebook¶

Before running the Notebook, make sure to install and import the following modules.

Module: matplotlib, numpy

In [1]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
In [2]:
mpl.get_backend()
Out[2]:
'module://ipykernel.pylab.backend_inline'
In [3]:
# Define a few lists that we'll use in our examples
x = [x for x in range(1,11)]
w = [x**1.5 for x in range(1,11)]
y = [x**1.75 for x in range(1,11)]
z = [x**2 for x in range(1,11)]

Scatter plots¶

In the next few cells, we'll build some simple scatter plots step-by-step. We start by plotting a single set of paired data. In this example and the examples that follow, we're primarily calling methods of the Axes class. See https://matplotlib.org/api/axes_api.html for more details.

In [4]:
w
Out[4]:
[1.0,
 2.8284271247461903,
 5.196152422706632,
 8.0,
 11.180339887498949,
 14.696938456699069,
 18.520259177452136,
 22.627416997969522,
 27.0,
 31.622776601683793]
In [5]:
#plt.figure()
#plt.scatter(x, w)
#plt.show()

f, ax = plt.subplots()
ax.scatter(x,w)
plt.show()
No description has been provided for this image

Borders of the plot can be turned off using the spines method. In the example below, we turn off the top and right borders. Code to turn off left and bottom borders in commented out.

In [6]:
#plt.figure()
#plt.scatter(x, w)
#plt.show()

f, ax = plt.subplots()
ax.scatter(x,w)

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
#ax.spines['left'].set_visible(False)
#ax.spines['bottom'].set_visible(False)

plt.show()
No description has been provided for this image

We can plot multiple data sets on the same graph with multiple calls to scatter()

In [7]:
#plt.figure()
#plt.scatter(x, w)
#plt.scatter(x, y)
#plt.scatter(x, z)
#plt.show()

f, ax = plt.subplots()
ax.scatter(x, w)
ax.scatter(x, y)
ax.scatter(x, z)
plt.show()
No description has been provided for this image

Data points are assigned default sizes, shapes (filled circles) and colors. We can override these defaults to customize our graphs.

In [8]:
#plt.figure()
#plt.scatter(x, w, c='red', s=25, marker='s')
#plt.scatter(x, y, c='blue', s=50, marker='^')
#plt.scatter(x, z, c='purple', s=100, marker='+')
#plt.show()

f, ax = plt.subplots()
ax.scatter(x, w, c='red', s=25, marker='s')
ax.scatter(x, y, c='blue', s=50, marker='^')
ax.scatter(x, z, c='purple', s=100, marker='+')
plt.show()
No description has been provided for this image

In the previous example, we assigned the same color and marker size to all points in our plot. We can also use lists to assign different sizes and colors to each point

In [9]:
#plt.figure()
#colors = []
#colors[0:4] = ['red'] * 5
#colors[5:9] = ['black'] * 5
#sizes = [25 + x*10 for x in range(10)]
#plt.scatter(x, w, c=colors, s=sizes, marker='o')
#plt.show()

f, ax = plt.subplots()
colors = []
colors[0:4] = ['red'] * 5
colors[5:9] = ['black'] * 5
sizes = [25 + x*10 for x in range(10)]
ax.scatter(x, w, c=colors, s=sizes, marker='o')
plt.show()
No description has been provided for this image

We can make the figure more useful by adding a title and axis labels

In [10]:
#plt.figure()
#plt.xlabel('X axis')
#plt.ylabel('Y axis')
#plt.title('Plot of three data sets')
#plt.scatter(x, w, c='red', s=25, marker='s')
#plt.scatter(x, y, c='blue', s=50, marker='^')
#plt.scatter(x, z, c='purple', s=100, marker='+')
#plt.show()

f, ax = plt.subplots()
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_title('Plot of three data sets')
ax.scatter(x, w, c='red', s=25, marker='s')
ax.scatter(x, y, c='blue', s=50, marker='^')
ax.scatter(x, z, c='purple', s=100, marker='+')
plt.show()
No description has been provided for this image

Reasonable bounds for the axes are chosen based on the range of data values, but we can manually set using xlim and ylim

In [11]:
#plt.figure()
#plt.xlabel('X axis')
#plt.ylabel('Y axis')
#plt.title('Plot of three data sets')
#plt.xlim(-0.5, 15)
#plt.ylim(-5, 150)
#plt.scatter(x, w, c='red', s=25, marker='s')
#plt.scatter(x, y, c='blue', s=50, marker='^')
#plt.scatter(x, z, c='purple', s=100, marker='+')
#plt.show()

f, ax = plt.subplots()
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_title('Plot of three data sets')
#ax.set_xlim(-0.5, 15)
ax.set_ylim(-5, 150)
ax.scatter(x, w, c='red', s=25, marker='s')
ax.scatter(x, y, c='blue', s=50, marker='^')
ax.scatter(x, z, c='purple', s=100, marker='+')
plt.show()
No description has been provided for this image

By default, we get linear scales for the x and y axes. We can also choose log axes, with an optional base (10 by default) using the xscale and yscale methods.

In [12]:
#plt.figure()
#plt.xlabel('X axis')
#plt.ylabel('Y axis')
#plt.title('Plot of three data sets')
#plt.yscale('log')
#plt.xscale('log', basex=2)
#plt.scatter(x, w, c='red', s=25, marker='s')
#plt.scatter(x, y, c='blue', s=50, marker='^')
#plt.scatter(x, z, c='purple', s=100, marker='+')
#plt.show()

# NOTE - syntax has changed for log axes - use basex instead of xbase

f, ax = plt.subplots()
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_title('Plot of three data sets')
ax.set_yscale('log')


ax.set_xscale('log', base=2)

ax.scatter(x, w, c='red', s=25, marker='s')
ax.scatter(x, y, c='blue', s=50, marker='^')
ax.scatter(x, z, c='purple', s=100, marker='+')
plt.show()
No description has been provided for this image

Let's go back to our linear scales and default axis limits and add a legend to the plot. Note that we had to modify the calls to scatter to set labels for the data.

By default, the legend will be placed in the "best" location so as to not interfere with the plot, no title is given and a frame is placed around the legend. None of these are required and we can call without any arguments.

In [13]:
#plt.figure()
#plt.xlabel('X axis')
#plt.ylabel('Y axis')
#plt.title('Plot of three data sets')
#plt.scatter(x, w, c='red', s=25, marker='s', label='x^1.5')
#plt.scatter(x, y, c='blue', s=50, marker='^', label='x^1.75')
#plt.scatter(x, z, c='purple', s=100, marker='+', label='x^2')
#plt.legend(title='Plot legend', frameon=False)
#plt.show()

f, ax = plt.subplots()
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_title('Plot of three data sets')
ax.scatter(x, w, c='red', s=25, marker='s', label='x^1.5')
ax.scatter(x, y, c='blue', s=50, marker='^', label='x^1.75')
ax.scatter(x, z, c='purple', s=100, marker='+', label='x^2')
ax.legend(title='Plot legend', frameon=False)
plt.show()
No description has been provided for this image

Line plots using the plt.plot method¶

Although they have many similarities, scatter plots and line plots have some important differences. The former allow greater control over the marker properties, while the latter can contain lines that join the data points. In its simplest form, plt.plot takes a pair of x,y sequences

In [14]:
#plt.figure()
#plt.plot(x, w)
#plt.show()

f, ax = plt.subplots()
ax.plot(x, w)
plt.show()
No description has been provided for this image

Any number of x,y pairs can be displayed on the same plot

In [15]:
#plt.figure()
#plt.plot(x, w, x, y, x, z)
#plt.show()

f, ax = plt.subplots()
plt.plot(x, w, x, y, x, z)
plt.show()
No description has been provided for this image

The line color, marker shape and line style can be set by passing an additional string argument for each x,y pair of the form "{color abbreviation}{marker abbreviation}{line style}". A full list of abbreviations and styles can be found at the plt.plot documentation.

In [16]:
#plt.figure()
#plt.plot(x, w, 'ro-', x, y, 'bs:', x, z, 'g^--')
#plt.show()

f, ax = plt.subplots()
ax.plot(x, w, 'ro-', x, y, 'bs:', x, z, 'g^--')
plt.show()
No description has been provided for this image

A brief digression - LaTeX formatting¶

Matplotlib recognizes a subset of LaTeX syntax, so you can include Greek letters, superscripts, subscripts and other math formatting features in the plot labels. Escape to LaTeX mode by placing between dollar signs.

In [17]:
#plt.figure()
#plt.xlabel('X axis ($\Omega$)')
#plt.ylabel('Y axis ($\lambda$)')
#plt.title('Plot of three data sets')
#plt.scatter(x, w, c='red', s=25, marker='s', label='$x^{1.5}$')
#plt.scatter(x, y, c='blue', s=50, marker='^', label='$x^{1.75}$')
#plt.scatter(x, z, c='purple', s=100, marker='+', label='$x^2$')
#plt.legend(title='Plot legend', frameon=False)
#plt.show()

f, ax = plt.subplots()
ax.set_xlabel('X axis ($\Omega$)')
ax.set_ylabel('Y axis ($\lambda$)')
ax.set_title('Plot of three data sets')
ax.scatter(x, w, c='red', s=25, marker='s', label='$x^{1.5}$')
ax.scatter(x, y, c='blue', s=50, marker='^', label='$x^{1.75}$')
ax.scatter(x, z, c='purple', s=100, marker='+', label='$x^2$')
ax.legend(title='Plot legend', frameon=False)
plt.show()
No description has been provided for this image

Bar charts¶

The Matplotlib bar chart method behaves much like the other plotting methods we've seen so far. In the example below, we show the results for hot dog eating contest. We pass a list of colors to differentiate nationalities (USA vs. Costa Rica)

In [18]:
people = ['Bob', 'Jorge', 'Esteban', 'Mariano', 'Mahidhar']
x_pos = np.arange(len(people))
hot_dogs = [10, 12, 17, 8, 14]
colors = ['blue', 'red', 'red', 'red', 'blue']

#plt.figure()
#plt.bar(x_pos, hot_dogs, align='center', color=colors)
#plt.xticks(x_pos, people)
#plt.ylabel('# hot dogs eaten')
#plt.title('Hot dog eating contest results')
#plt.show()

f, ax = plt.subplots()
ax.bar(x_pos, hot_dogs, align='center', color=colors)
ax.set_xticks(x_pos)
ax.set_xticklabels(people)
ax.set_ylabel('# hot dogs eaten')
ax.set_title('Hot dog eating contest results')
plt.show()
No description has been provided for this image

Plotting multiple data series on the same figure is similar to what we did for scatter plots using multiple calls to the plt.bar method.

In this new plot, we added one more feature and set an edge color for the bars. Due to a bug in older versions of matplotlib, by default the edge was only drawn for the first bar unless we pass a list of values to edgecolor.

In [19]:
people = ['Bob', 'Jorge', 'Esteban', 'Mariano', 'Mahidhar']
x_pos = np.arange(len(people))
hot_dogs = [10, 12, 17, 8, 14]
pies = [2, 7, 3, 5, 8]

#plt.figure()
#plt.bar(x_pos, hot_dogs, align='center', 
#        color='red', label='hot dogs', edgecolor=['black']*len(people))
#plt.bar(x_pos, pies,     align='center', 
#        color='blue', label='pies', edgecolor=['black']*len(people))
#plt.xticks(x_pos, people)
#plt.ylabel('# eaten')
#plt.title('Eating contest results')
#plt.legend(frameon=False, loc='upper left')
#plt.show()

f, ax = plt.subplots()
ax.bar(x_pos, hot_dogs, align='center', 
       color='red', label='hot dogs', edgecolor=['black']*len(people))
ax.bar(x_pos, pies,     align='center', 
       color='blue', label='pies', edgecolor=['black']*len(people))
ax.set_xticks(x_pos)
ax.set_xticklabels(people)
ax.set_ylabel('# eaten')
ax.set_title('Eating contest results')
ax.legend(frameon=False, loc='upper left')
plt.show()
No description has been provided for this image

This previous figure turned out fine since every contestant ate more hot dogs (first set plotted) than pies (second set plotted). If one of the contestants ate more pies than hot dogs, the "hot dogs" bar would be completely obscured by the "pies" bar. To avoid this problem, we can change the widths of the bars and offset their locations.

In [20]:
people = ['Bob', 'Jorge', 'Esteban', 'Mariano', 'Mahidhar']
x_pos = np.arange(len(people))
hot_dogs = [10, 12, 17, 8, 14]
pies = [2, 7, 3, 5, 8]

#plt.figure()
#plt.bar(x_pos - 0.2, hot_dogs, width=0.35, align='center', 
#        color='red', label='hot dogs', edgecolor=['black']*len(people))
#plt.bar(x_pos + 0.2, pies,     width=0.35, align='center', 
#        color='blue', label='pies', edgecolor=['black']*len(people))
#plt.xticks(x_pos, people)
#plt.ylabel('# eaten')
#plt.title('Eating contest results')
#plt.legend(frameon=False, loc='upper left')
#plt.show()

f, ax = plt.subplots()
ax.bar(x_pos - 0.2, hot_dogs, width=0.35, align='center', 
       color='red', label='hot dogs', edgecolor=['black']*len(people))
ax.bar(x_pos + 0.2, pies,     width=0.35, align='center', 
       color='blue', label='pies', edgecolor=['black']*len(people))
ax.set_xticks(x_pos)
ax.set_xticklabels(people)
ax.set_ylabel('# eaten')
ax.set_title('Eating contest results')
ax.legend(frameon=False, loc='upper left')
plt.show()
No description has been provided for this image

Let's introduce one more advanced features. Imagine that the tick labels are so long that they would overrun each other if rendered horizontally. To avoid that, we can rotate each of the tick labels by 45 degrees before generating the figure.

Note that to do this using the basic interface, we need get the current axis using the plt.gca method. In my opinion, it's easier to just work directly with the Axes class from the beginning.

In [21]:
people = ['Bob San Diego', 'Jorge Costa Rica', 'Esteban Costa Rica', 
          'Mariano Costa Rica', 'Mahidhar San Diego']

#plt.figure()
#plt.bar(x_pos - 0.2, hot_dogs, width=0.35, align='center', 
#        color='red', label='hot dogs', edgecolor='black')
#plt.bar(x_pos + 0.2, pies,     width=0.35, align='center', 
#        color='blue', label='pies', edgecolor='black')
#plt.xticks(x_pos, people)
#plt.gca().set_xticklabels(people)
#for tick in plt.gca().get_xticklabels():
#    tick.set_rotation(45)
#plt.ylabel('# eaten')
#plt.title('Eating contest results')
#plt.legend(frameon=False, loc='upper left')
#plt.show()

f, ax = plt.subplots()
ax.bar(x_pos - 0.2, hot_dogs, width=0.35, align='center', 
       color='red', label='hot dogs', edgecolor=['black']*len(people))
ax.bar(x_pos + 0.2, pies,     width=0.35, align='center', 
       color='blue', label='pies', edgecolor=['black']*len(people))
ax.set_xticks(x_pos)
ax.set_xticklabels(people)
for tick in ax.get_xticklabels():
    tick.set_rotation(45)
ax.set_ylabel('# eaten')
ax.set_title('Eating contest results')
ax.legend(frameon=False, loc='upper left')
plt.show()
No description has been provided for this image

Histograms¶

If you followed my NumPy tutorial, you'll recall that we generated a histogram using NumPy's histogram method and then plotted as a line graph. A better way is to use pyplot's hist method, which both generates the histogram data from the input data and renders.

In [22]:
# Build two vectors of 10000 and 5000 normal deviates, respectively, with variance 0.5^2 and mean 2
#import numpy as np
mu, sigma = 2, 0.5
np.random.seed(1234)
v1 = np.random.normal(mu,sigma,10000)
v2 = np.random.normal(mu,sigma,5000)
In [23]:
#plt.figure()
#plt.hist(v1, bins=50, cumulative=True, edgecolor='black', color='gray')
#plt.show()

f, ax = plt.subplots()
ax.hist(v1, bins=50, cumulative=True, edgecolor='black', color='gray')
plt.show()
No description has been provided for this image

We can display multiple histograms on the same figure. Instead of passing a single data set, we use a list of data sets. Colors and other attributes can also be passed as lists of the same length. In the following example, we plot two data sets.

In [24]:
#plt.figure()
#plt.hist([v1,v2], color=['green', 'purple'], bins=15)
#plt.show()

f, ax = plt.subplots()
ax.hist([v1,v2], color=['green', 'purple'], bins=15, edgecolor='black')
plt.show()
No description has been provided for this image

Like most pyplot methods, plt.hist provide many options, including stacked bars (height is the sum of data sets) and cumulative plotting (each bin is the running sum of the previous bins).

In [25]:
#plt.figure()
#plt.hist([v1,v2], color=['blue', 'purple'], cumulative=True, histtype='barstacked', bins=50)
#plt.show()

f, ax = plt.subplots()
plt.hist([v1,v2], color=['blue', 'purple'], cumulative=True, histtype='barstacked', bins=50)
plt.show()
No description has been provided for this image

A brief digression - colors¶

Until now, we've worked with the basic colors (red, blue, green, purple, etc.), but Matplotlib allows colors to be specified in a variety of formats including hex RGB strings (e.g. #c79fef, $ffd1df) or the xkcd color survey format (e.g. 'xkcd:dark purple' and 'xkcd:aquamarine'. For more details see https://matplotlib.org/api/colors_api.html and https://xkcd.com/color/rgb/

In [26]:
#plt.figure()
#plt.hist([v1,v2], color=['#c79fef', '#ffd1df'], cumulative=True, histtype='barstacked', bins=50)
#plt.show()

f, ax = plt.subplots()
plt.hist([v1,v2], color=['#c79fef', '#ffd1df'], cumulative=True, histtype='barstacked', bins=50)
plt.show()
No description has been provided for this image
In [27]:
#plt.figure()
#plt.hist([v1,v2], color=['xkcd:dark purple', 'xkcd:aquamarine'], cumulative=True, histtype='barstacked', bins=50)
#plt.show()

f, ax = plt.subplots()
ax.hist([v1,v2], color=['xkcd:dark purple', 'xkcd:aquamarine'], cumulative=True, histtype='barstacked', bins=50)
plt.show()
No description has been provided for this image

Other chart types¶

We've touched on some of the most important figure types, but we've only scratched the surface. We show an example of a pie chart below and refer you to the the Matplotlib gallery for more options https://matplotlib.org/gallery.html

In [28]:
populations = [39, 27, 20, 19, 12]
states = ['California', 'Texas', 'Florida', 'New York', 'Illinois']
explodes = [0.1, 0, 0, 0, 0]
colors = ['blue', 'yellow', 'orange', 'green', 'brown']

#plt.figure()
#plt.pie(values, labels=states, explode=explodes, colors=colors, shadow=True)
#plt.gca().axis('equal')
#plt.show()

f, ax = plt.subplots()
ax.pie(populations, labels=states, explode=explodes, colors=colors, shadow=True)
ax.axis('equal')
plt.show()
No description has been provided for this image

A pie chart can be turned into a donut chart by adding a white circle to the center of the plot. As a finishing touch, added a second circle to provide a border

In [29]:
populations = [39, 27, 20, 19, 12]
states = ['California', 'Texas', 'Florida', 'New York', 'Illinois']
colors = ['blue', 'yellow', 'orange', 'green', 'brown']

f, ax = plt.subplots()
ax.pie(populations, labels=states, colors=colors, shadow=False)
ax.axis('equal')

#draw a circle at the center of pie to make it look like a donut
centre_circle = plt.Circle((0,0), 0.6, color='black', fc='white', linewidth=1.0)
fig = plt.gcf()
fig.gca().add_artist(centre_circle)

#draw a circle at the edge of pie to provide border (don't see way to do this with pie function)
outer_circle = plt.Circle((0,0), 1.0, color='black', fill=False, linewidth=1.0)
fig = plt.gcf()
fig.gca().add_artist(outer_circle)

plt.show()
No description has been provided for this image

Subplots¶

One of the most useful features of Matplotlib is the ability to create complex figures containing multiple subplots. There are several ways to manage subplots, but I think that the syntax shown below is the most straightforward.

We start with a call to the plt.subplots function, which accepts the number of rows and columns defining the grid and returns a figure object and an axes object or an array of axes objects.

In [30]:
# Define/redefine some data sets
x = [x for x in range(1,11)]
v = [x**1.25 for x in range(1,11)]
w = [x**1.5 for x in range(1,11)]
y = [x**1.75 for x in range(1,11)]
z = [x**2 for x in range(1,11)]

In the first example, we'll plot a pair of figures side. Note that each subfigure has its own scale.

In [31]:
f, (ax1, ax2) = plt.subplots(1,2)
ax1.scatter(x, v, color='blue', marker='s')
ax2.scatter(x, w, color='red',  marker='o')
plt.show()
No description has been provided for this image

We can pass additional arguments to plt.subplots to control the figure size, label the y-axis for the left plot and share the y-axis so that only the left subplot has the scale and the same scale is used for both. We'll also add legends for the two subplots

In [32]:
f, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(6,5))

ax1.scatter(x, v, color='blue', marker='s', label='$x^{1.25}$')
ax1.set_ylabel('y axis')
ax1.legend(loc='upper left', frameon=False)

ax2.scatter(x, w, color='red',  marker='o', label='$x^{1.5}$')
ax2.legend(loc='upper left', frameon=False)

plt.show()
No description has been provided for this image
In [33]:
f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(6,6), sharex=True, sharey=True)
ax1.scatter(x, v, label='$x^{1.25}$')
ax1.legend(loc='upper left', frameon=False)

ax2.scatter(x, w, label='$x^{1.5}$')
ax2.legend(loc='upper left', frameon=False)

ax3.scatter(x, y, label='$x^{1.75}$')
ax3.legend(loc='upper left', frameon=False)

ax4.scatter(x, z, label='$x^{2.0}$')
ax4.legend(loc='upper left', frameon=False)

f.subplots_adjust(hspace=0.1, wspace=0.1)
f.suptitle('2x2 array of subplots')

plt.show()
No description has been provided for this image
In [34]:
plt.Circle?
In [35]:
plt.pie?

Submit Ticket¶

If you find anything that needs to be changed, edited, or if you would like to provide feedback or contribute to the notebook, please submit a ticket by contacting us at:

Email: consult@sdsc.edu

We appreciate your input and will review your suggestions promptly!

In [ ]:
 
In [ ]: