Source Information¶
Created by:
Updated by: October 25, 2024 by Gloria Seo
Resources: https://matplotlib.org/
Goal¶
This notebook purpose is to learn how to use Matplotlib Python Package, which is a data visualization and graphical plotting library. We will be covering topics including basic syntax, scatter plots, line plots, bar charts, and histograms in the Notebook.
Matplotlib¶
There are multiple ways to do plotting and charting in Python. Here we'll focus on Matplotlib and several basic types of charts: scatter plots, line plots, bar plots and histograms. Matplotlib has three layers. A backend layer that renders a plot to a screen or file, an artist layer that defines containers and primitives (e.g. figures axes, subplots, line2D, rectangle) and a scripting layer that we use to interact with the backend and artist layers. For the scripting layer, we'll be using pyplot.
Note on syntax¶
Matplotlib provides two ways to generate figures. The more basic or elementary approach involves calling the matplotlib.pyplot.figure method, which creates a new figure.
The more advanced approach starts with a call to the matplotlib.pyplot.subplots method, which returns a Figure object and one or more Axes objects.
Throughout this tutorial we'll use the latter, with the more basic approach commented out for reference. In my opinion this is more intuitive, makes it easier to access the advanced features and eases the transition to generating complex figures containing more than one subgraph. Many of the questions answered on resources such as StackOverflow are based on the advanced interface.
Required Modules for the Jupyter Notebook¶
Before running the Notebook, make sure to install and import the following modules.
Module: matplotlib, numpy
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
mpl.get_backend()
'module://ipykernel.pylab.backend_inline'
# Define a few lists that we'll use in our examples
x = [x for x in range(1,11)]
w = [x**1.5 for x in range(1,11)]
y = [x**1.75 for x in range(1,11)]
z = [x**2 for x in range(1,11)]
Scatter plots¶
In the next few cells, we'll build some simple scatter plots step-by-step. We start by plotting a single set of paired data. In this example and the examples that follow, we're primarily calling methods of the Axes class. See https://matplotlib.org/api/axes_api.html for more details.
w
[1.0, 2.8284271247461903, 5.196152422706632, 8.0, 11.180339887498949, 14.696938456699069, 18.520259177452136, 22.627416997969522, 27.0, 31.622776601683793]
#plt.figure()
#plt.scatter(x, w)
#plt.show()
f, ax = plt.subplots()
ax.scatter(x,w)
plt.show()
Borders of the plot can be turned off using the spines method. In the example below, we turn off the top and right borders. Code to turn off left and bottom borders in commented out.
#plt.figure()
#plt.scatter(x, w)
#plt.show()
f, ax = plt.subplots()
ax.scatter(x,w)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
#ax.spines['left'].set_visible(False)
#ax.spines['bottom'].set_visible(False)
plt.show()
We can plot multiple data sets on the same graph with multiple calls to scatter()
#plt.figure()
#plt.scatter(x, w)
#plt.scatter(x, y)
#plt.scatter(x, z)
#plt.show()
f, ax = plt.subplots()
ax.scatter(x, w)
ax.scatter(x, y)
ax.scatter(x, z)
plt.show()
Data points are assigned default sizes, shapes (filled circles) and colors. We can override these defaults to customize our graphs.
#plt.figure()
#plt.scatter(x, w, c='red', s=25, marker='s')
#plt.scatter(x, y, c='blue', s=50, marker='^')
#plt.scatter(x, z, c='purple', s=100, marker='+')
#plt.show()
f, ax = plt.subplots()
ax.scatter(x, w, c='red', s=25, marker='s')
ax.scatter(x, y, c='blue', s=50, marker='^')
ax.scatter(x, z, c='purple', s=100, marker='+')
plt.show()
In the previous example, we assigned the same color and marker size to all points in our plot. We can also use lists to assign different sizes and colors to each point
#plt.figure()
#colors = []
#colors[0:4] = ['red'] * 5
#colors[5:9] = ['black'] * 5
#sizes = [25 + x*10 for x in range(10)]
#plt.scatter(x, w, c=colors, s=sizes, marker='o')
#plt.show()
f, ax = plt.subplots()
colors = []
colors[0:4] = ['red'] * 5
colors[5:9] = ['black'] * 5
sizes = [25 + x*10 for x in range(10)]
ax.scatter(x, w, c=colors, s=sizes, marker='o')
plt.show()
We can make the figure more useful by adding a title and axis labels
#plt.figure()
#plt.xlabel('X axis')
#plt.ylabel('Y axis')
#plt.title('Plot of three data sets')
#plt.scatter(x, w, c='red', s=25, marker='s')
#plt.scatter(x, y, c='blue', s=50, marker='^')
#plt.scatter(x, z, c='purple', s=100, marker='+')
#plt.show()
f, ax = plt.subplots()
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_title('Plot of three data sets')
ax.scatter(x, w, c='red', s=25, marker='s')
ax.scatter(x, y, c='blue', s=50, marker='^')
ax.scatter(x, z, c='purple', s=100, marker='+')
plt.show()
Reasonable bounds for the axes are chosen based on the range of data values, but we can manually set using xlim and ylim
#plt.figure()
#plt.xlabel('X axis')
#plt.ylabel('Y axis')
#plt.title('Plot of three data sets')
#plt.xlim(-0.5, 15)
#plt.ylim(-5, 150)
#plt.scatter(x, w, c='red', s=25, marker='s')
#plt.scatter(x, y, c='blue', s=50, marker='^')
#plt.scatter(x, z, c='purple', s=100, marker='+')
#plt.show()
f, ax = plt.subplots()
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_title('Plot of three data sets')
#ax.set_xlim(-0.5, 15)
ax.set_ylim(-5, 150)
ax.scatter(x, w, c='red', s=25, marker='s')
ax.scatter(x, y, c='blue', s=50, marker='^')
ax.scatter(x, z, c='purple', s=100, marker='+')
plt.show()
By default, we get linear scales for the x and y axes. We can also choose log axes, with an optional base (10 by default) using the xscale and yscale methods.
#plt.figure()
#plt.xlabel('X axis')
#plt.ylabel('Y axis')
#plt.title('Plot of three data sets')
#plt.yscale('log')
#plt.xscale('log', basex=2)
#plt.scatter(x, w, c='red', s=25, marker='s')
#plt.scatter(x, y, c='blue', s=50, marker='^')
#plt.scatter(x, z, c='purple', s=100, marker='+')
#plt.show()
# NOTE - syntax has changed for log axes - use basex instead of xbase
f, ax = plt.subplots()
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_title('Plot of three data sets')
ax.set_yscale('log')
ax.set_xscale('log', base=2)
ax.scatter(x, w, c='red', s=25, marker='s')
ax.scatter(x, y, c='blue', s=50, marker='^')
ax.scatter(x, z, c='purple', s=100, marker='+')
plt.show()
Let's go back to our linear scales and default axis limits and add a legend to the plot. Note that we had to modify the calls to scatter to set labels for the data.
By default, the legend will be placed in the "best" location so as to not interfere with the plot, no title is given and a frame is placed around the legend. None of these are required and we can call without any arguments.
#plt.figure()
#plt.xlabel('X axis')
#plt.ylabel('Y axis')
#plt.title('Plot of three data sets')
#plt.scatter(x, w, c='red', s=25, marker='s', label='x^1.5')
#plt.scatter(x, y, c='blue', s=50, marker='^', label='x^1.75')
#plt.scatter(x, z, c='purple', s=100, marker='+', label='x^2')
#plt.legend(title='Plot legend', frameon=False)
#plt.show()
f, ax = plt.subplots()
ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_title('Plot of three data sets')
ax.scatter(x, w, c='red', s=25, marker='s', label='x^1.5')
ax.scatter(x, y, c='blue', s=50, marker='^', label='x^1.75')
ax.scatter(x, z, c='purple', s=100, marker='+', label='x^2')
ax.legend(title='Plot legend', frameon=False)
plt.show()
Line plots using the plt.plot method¶
Although they have many similarities, scatter plots and line plots have some important differences. The former allow greater control over the marker properties, while the latter can contain lines that join the data points. In its simplest form, plt.plot takes a pair of x,y sequences
#plt.figure()
#plt.plot(x, w)
#plt.show()
f, ax = plt.subplots()
ax.plot(x, w)
plt.show()
Any number of x,y pairs can be displayed on the same plot
#plt.figure()
#plt.plot(x, w, x, y, x, z)
#plt.show()
f, ax = plt.subplots()
plt.plot(x, w, x, y, x, z)
plt.show()
The line color, marker shape and line style can be set by passing an additional string argument for each x,y pair of the form "{color abbreviation}{marker abbreviation}{line style}". A full list of abbreviations and styles can be found at the plt.plot documentation.
#plt.figure()
#plt.plot(x, w, 'ro-', x, y, 'bs:', x, z, 'g^--')
#plt.show()
f, ax = plt.subplots()
ax.plot(x, w, 'ro-', x, y, 'bs:', x, z, 'g^--')
plt.show()
A brief digression - LaTeX formatting¶
Matplotlib recognizes a subset of LaTeX syntax, so you can include Greek letters, superscripts, subscripts and other math formatting features in the plot labels. Escape to LaTeX mode by placing between dollar signs.
#plt.figure()
#plt.xlabel('X axis ($\Omega$)')
#plt.ylabel('Y axis ($\lambda$)')
#plt.title('Plot of three data sets')
#plt.scatter(x, w, c='red', s=25, marker='s', label='$x^{1.5}$')
#plt.scatter(x, y, c='blue', s=50, marker='^', label='$x^{1.75}$')
#plt.scatter(x, z, c='purple', s=100, marker='+', label='$x^2$')
#plt.legend(title='Plot legend', frameon=False)
#plt.show()
f, ax = plt.subplots()
ax.set_xlabel('X axis ($\Omega$)')
ax.set_ylabel('Y axis ($\lambda$)')
ax.set_title('Plot of three data sets')
ax.scatter(x, w, c='red', s=25, marker='s', label='$x^{1.5}$')
ax.scatter(x, y, c='blue', s=50, marker='^', label='$x^{1.75}$')
ax.scatter(x, z, c='purple', s=100, marker='+', label='$x^2$')
ax.legend(title='Plot legend', frameon=False)
plt.show()
Bar charts¶
The Matplotlib bar chart method behaves much like the other plotting methods we've seen so far. In the example below, we show the results for hot dog eating contest. We pass a list of colors to differentiate nationalities (USA vs. Costa Rica)
people = ['Bob', 'Jorge', 'Esteban', 'Mariano', 'Mahidhar']
x_pos = np.arange(len(people))
hot_dogs = [10, 12, 17, 8, 14]
colors = ['blue', 'red', 'red', 'red', 'blue']
#plt.figure()
#plt.bar(x_pos, hot_dogs, align='center', color=colors)
#plt.xticks(x_pos, people)
#plt.ylabel('# hot dogs eaten')
#plt.title('Hot dog eating contest results')
#plt.show()
f, ax = plt.subplots()
ax.bar(x_pos, hot_dogs, align='center', color=colors)
ax.set_xticks(x_pos)
ax.set_xticklabels(people)
ax.set_ylabel('# hot dogs eaten')
ax.set_title('Hot dog eating contest results')
plt.show()
Plotting multiple data series on the same figure is similar to what we did for scatter plots using multiple calls to the plt.bar method.
In this new plot, we added one more feature and set an edge color for the bars. Due to a bug in older versions of matplotlib, by default the edge was only drawn for the first bar unless we pass a list of values to edgecolor.
people = ['Bob', 'Jorge', 'Esteban', 'Mariano', 'Mahidhar']
x_pos = np.arange(len(people))
hot_dogs = [10, 12, 17, 8, 14]
pies = [2, 7, 3, 5, 8]
#plt.figure()
#plt.bar(x_pos, hot_dogs, align='center',
# color='red', label='hot dogs', edgecolor=['black']*len(people))
#plt.bar(x_pos, pies, align='center',
# color='blue', label='pies', edgecolor=['black']*len(people))
#plt.xticks(x_pos, people)
#plt.ylabel('# eaten')
#plt.title('Eating contest results')
#plt.legend(frameon=False, loc='upper left')
#plt.show()
f, ax = plt.subplots()
ax.bar(x_pos, hot_dogs, align='center',
color='red', label='hot dogs', edgecolor=['black']*len(people))
ax.bar(x_pos, pies, align='center',
color='blue', label='pies', edgecolor=['black']*len(people))
ax.set_xticks(x_pos)
ax.set_xticklabels(people)
ax.set_ylabel('# eaten')
ax.set_title('Eating contest results')
ax.legend(frameon=False, loc='upper left')
plt.show()
This previous figure turned out fine since every contestant ate more hot dogs (first set plotted) than pies (second set plotted). If one of the contestants ate more pies than hot dogs, the "hot dogs" bar would be completely obscured by the "pies" bar. To avoid this problem, we can change the widths of the bars and offset their locations.
people = ['Bob', 'Jorge', 'Esteban', 'Mariano', 'Mahidhar']
x_pos = np.arange(len(people))
hot_dogs = [10, 12, 17, 8, 14]
pies = [2, 7, 3, 5, 8]
#plt.figure()
#plt.bar(x_pos - 0.2, hot_dogs, width=0.35, align='center',
# color='red', label='hot dogs', edgecolor=['black']*len(people))
#plt.bar(x_pos + 0.2, pies, width=0.35, align='center',
# color='blue', label='pies', edgecolor=['black']*len(people))
#plt.xticks(x_pos, people)
#plt.ylabel('# eaten')
#plt.title('Eating contest results')
#plt.legend(frameon=False, loc='upper left')
#plt.show()
f, ax = plt.subplots()
ax.bar(x_pos - 0.2, hot_dogs, width=0.35, align='center',
color='red', label='hot dogs', edgecolor=['black']*len(people))
ax.bar(x_pos + 0.2, pies, width=0.35, align='center',
color='blue', label='pies', edgecolor=['black']*len(people))
ax.set_xticks(x_pos)
ax.set_xticklabels(people)
ax.set_ylabel('# eaten')
ax.set_title('Eating contest results')
ax.legend(frameon=False, loc='upper left')
plt.show()
Let's introduce one more advanced features. Imagine that the tick labels are so long that they would overrun each other if rendered horizontally. To avoid that, we can rotate each of the tick labels by 45 degrees before generating the figure.
Note that to do this using the basic interface, we need get the current axis using the plt.gca method. In my opinion, it's easier to just work directly with the Axes class from the beginning.
people = ['Bob San Diego', 'Jorge Costa Rica', 'Esteban Costa Rica',
'Mariano Costa Rica', 'Mahidhar San Diego']
#plt.figure()
#plt.bar(x_pos - 0.2, hot_dogs, width=0.35, align='center',
# color='red', label='hot dogs', edgecolor='black')
#plt.bar(x_pos + 0.2, pies, width=0.35, align='center',
# color='blue', label='pies', edgecolor='black')
#plt.xticks(x_pos, people)
#plt.gca().set_xticklabels(people)
#for tick in plt.gca().get_xticklabels():
# tick.set_rotation(45)
#plt.ylabel('# eaten')
#plt.title('Eating contest results')
#plt.legend(frameon=False, loc='upper left')
#plt.show()
f, ax = plt.subplots()
ax.bar(x_pos - 0.2, hot_dogs, width=0.35, align='center',
color='red', label='hot dogs', edgecolor=['black']*len(people))
ax.bar(x_pos + 0.2, pies, width=0.35, align='center',
color='blue', label='pies', edgecolor=['black']*len(people))
ax.set_xticks(x_pos)
ax.set_xticklabels(people)
for tick in ax.get_xticklabels():
tick.set_rotation(45)
ax.set_ylabel('# eaten')
ax.set_title('Eating contest results')
ax.legend(frameon=False, loc='upper left')
plt.show()
Histograms¶
If you followed my NumPy tutorial, you'll recall that we generated a histogram using NumPy's histogram method and then plotted as a line graph. A better way is to use pyplot's hist method, which both generates the histogram data from the input data and renders.
# Build two vectors of 10000 and 5000 normal deviates, respectively, with variance 0.5^2 and mean 2
#import numpy as np
mu, sigma = 2, 0.5
np.random.seed(1234)
v1 = np.random.normal(mu,sigma,10000)
v2 = np.random.normal(mu,sigma,5000)
#plt.figure()
#plt.hist(v1, bins=50, cumulative=True, edgecolor='black', color='gray')
#plt.show()
f, ax = plt.subplots()
ax.hist(v1, bins=50, cumulative=True, edgecolor='black', color='gray')
plt.show()
We can display multiple histograms on the same figure. Instead of passing a single data set, we use a list of data sets. Colors and other attributes can also be passed as lists of the same length. In the following example, we plot two data sets.
#plt.figure()
#plt.hist([v1,v2], color=['green', 'purple'], bins=15)
#plt.show()
f, ax = plt.subplots()
ax.hist([v1,v2], color=['green', 'purple'], bins=15, edgecolor='black')
plt.show()
Like most pyplot methods, plt.hist provide many options, including stacked bars (height is the sum of data sets) and cumulative plotting (each bin is the running sum of the previous bins).
#plt.figure()
#plt.hist([v1,v2], color=['blue', 'purple'], cumulative=True, histtype='barstacked', bins=50)
#plt.show()
f, ax = plt.subplots()
plt.hist([v1,v2], color=['blue', 'purple'], cumulative=True, histtype='barstacked', bins=50)
plt.show()
A brief digression - colors¶
Until now, we've worked with the basic colors (red, blue, green, purple, etc.), but Matplotlib allows colors to be specified in a variety of formats including hex RGB strings (e.g. #c79fef, $ffd1df) or the xkcd color survey format (e.g. 'xkcd:dark purple' and 'xkcd:aquamarine'. For more details see https://matplotlib.org/api/colors_api.html and https://xkcd.com/color/rgb/
#plt.figure()
#plt.hist([v1,v2], color=['#c79fef', '#ffd1df'], cumulative=True, histtype='barstacked', bins=50)
#plt.show()
f, ax = plt.subplots()
plt.hist([v1,v2], color=['#c79fef', '#ffd1df'], cumulative=True, histtype='barstacked', bins=50)
plt.show()
#plt.figure()
#plt.hist([v1,v2], color=['xkcd:dark purple', 'xkcd:aquamarine'], cumulative=True, histtype='barstacked', bins=50)
#plt.show()
f, ax = plt.subplots()
ax.hist([v1,v2], color=['xkcd:dark purple', 'xkcd:aquamarine'], cumulative=True, histtype='barstacked', bins=50)
plt.show()
Other chart types¶
We've touched on some of the most important figure types, but we've only scratched the surface. We show an example of a pie chart below and refer you to the the Matplotlib gallery for more options https://matplotlib.org/gallery.html
populations = [39, 27, 20, 19, 12]
states = ['California', 'Texas', 'Florida', 'New York', 'Illinois']
explodes = [0.1, 0, 0, 0, 0]
colors = ['blue', 'yellow', 'orange', 'green', 'brown']
#plt.figure()
#plt.pie(values, labels=states, explode=explodes, colors=colors, shadow=True)
#plt.gca().axis('equal')
#plt.show()
f, ax = plt.subplots()
ax.pie(populations, labels=states, explode=explodes, colors=colors, shadow=True)
ax.axis('equal')
plt.show()
A pie chart can be turned into a donut chart by adding a white circle to the center of the plot. As a finishing touch, added a second circle to provide a border
populations = [39, 27, 20, 19, 12]
states = ['California', 'Texas', 'Florida', 'New York', 'Illinois']
colors = ['blue', 'yellow', 'orange', 'green', 'brown']
f, ax = plt.subplots()
ax.pie(populations, labels=states, colors=colors, shadow=False)
ax.axis('equal')
#draw a circle at the center of pie to make it look like a donut
centre_circle = plt.Circle((0,0), 0.6, color='black', fc='white', linewidth=1.0)
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
#draw a circle at the edge of pie to provide border (don't see way to do this with pie function)
outer_circle = plt.Circle((0,0), 1.0, color='black', fill=False, linewidth=1.0)
fig = plt.gcf()
fig.gca().add_artist(outer_circle)
plt.show()
Subplots¶
One of the most useful features of Matplotlib is the ability to create complex figures containing multiple subplots. There are several ways to manage subplots, but I think that the syntax shown below is the most straightforward.
We start with a call to the plt.subplots function, which accepts the number of rows and columns defining the grid and returns a figure object and an axes object or an array of axes objects.
# Define/redefine some data sets
x = [x for x in range(1,11)]
v = [x**1.25 for x in range(1,11)]
w = [x**1.5 for x in range(1,11)]
y = [x**1.75 for x in range(1,11)]
z = [x**2 for x in range(1,11)]
In the first example, we'll plot a pair of figures side. Note that each subfigure has its own scale.
f, (ax1, ax2) = plt.subplots(1,2)
ax1.scatter(x, v, color='blue', marker='s')
ax2.scatter(x, w, color='red', marker='o')
plt.show()
We can pass additional arguments to plt.subplots to control the figure size, label the y-axis for the left plot and share the y-axis so that only the left subplot has the scale and the same scale is used for both. We'll also add legends for the two subplots
f, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(6,5))
ax1.scatter(x, v, color='blue', marker='s', label='$x^{1.25}$')
ax1.set_ylabel('y axis')
ax1.legend(loc='upper left', frameon=False)
ax2.scatter(x, w, color='red', marker='o', label='$x^{1.5}$')
ax2.legend(loc='upper left', frameon=False)
plt.show()
f, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(6,6), sharex=True, sharey=True)
ax1.scatter(x, v, label='$x^{1.25}$')
ax1.legend(loc='upper left', frameon=False)
ax2.scatter(x, w, label='$x^{1.5}$')
ax2.legend(loc='upper left', frameon=False)
ax3.scatter(x, y, label='$x^{1.75}$')
ax3.legend(loc='upper left', frameon=False)
ax4.scatter(x, z, label='$x^{2.0}$')
ax4.legend(loc='upper left', frameon=False)
f.subplots_adjust(hspace=0.1, wspace=0.1)
f.suptitle('2x2 array of subplots')
plt.show()
plt.Circle?
plt.pie?
Submit Ticket¶
If you find anything that needs to be changed, edited, or if you would like to provide feedback or contribute to the notebook, please submit a ticket by contacting us at:
Email: consult@sdsc.edu
We appreciate your input and will review your suggestions promptly!