Scatter Plots with the scatter Method
In the previous article we were using the plot method to create scatter plots. But there’s a specialized and more powerful method to do that. In this article we’re going to be using it. Let’s have a look at the example from the previous article:
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 40)
y = np.sin(x)
fig, ax = plt.subplots()
ax.plot(x, y, 'o')
Now let’s use the scatter method instead of the plot method. Here we must explicitly specify what the third argument is:
fig, ax = plt.subplots()
ax.scatter(x, y, marker='o')
You can also use other markers:
fig, ax = plt.subplots()
ax.scatter(x, y, marker='s')
Unlike the plot method, scatter lets us control each point individually.
# Here's how we can control the sizes of the points. Let's define a list
# of sizes from smallest to greatest using a list comprehension.
sizes = [i for i in range(40)]
# And now let's use the sizes to plot the function.
fig, ax = plt.subplots()
ax.scatter(x, y, s=sizes)
Here’s another example:
# Let's randomize the sizes. This code will produce a different output each time you execute it.
sizes = [i for i in 100 * np.random.randn(10)]
fig, ax = plt.subplots()
ax.scatter(x, y, s=sizes, marker='<')
We can also control the colors of the markers:
# Let's create a list of colors and use it in the scatter method.
# Here we're using 4 colors. As there are 40 points, we multiply
# the list by 10 to repeat its elements 10 times, thus getting
# 40 elements altogether.
colors = 10 * ['b', 'r', 'g', 'y']
fig, ax = plt.subplots()
ax.scatter(x, y, c=colors)
Here’s an example with random colors:
# This code will produce a different output each time you execute it.
colors = np.random.randn(40)
fig, ax = plt.subplots()
ax.scatter(x, y, c=colors)
You can also set the alpha value:
# Here the points are pretty big, so you can watch them overlap.
fig, ax = plt.subplots()
ax.scatter(x, y, marker='o', s=1000, alpha=.5)
If you can achieve your goal with the plot method, you should choose it over scatter because it’s more efficient. It’s especially important for large datasets.