Pandas makes it easy to operate on data. In case of unary operations index and column labels are preserved. In case of binary operations indices are aligned.
Let’s have a look at a unary operation first:
import numpy as np
import pandas as pd
# Let's create a Series object and a DataFrame object to work on.
numSeries = pd.Series([3, 6, 5, 10],
index = ['A', 'B', 'C', 'D'])
numDataFrame = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
columns = ['A', 'B', 'C', 'D'])
# Let's print out the Series...
# and the DataFrame.
# Now let's apply the unary sin function to the Series.
np.sin(numSeries * np.pi)
# And let's do the same to the DataFrame.
np.sin(numDataFrame * np.pi)
As you can see, in both the Series and the DataFrame the indices have been preserved. So have the column labels in the DataFrame.
And now let’s have a look at binary operations between two Series objects and two DataFrame objects.
# First let's create two Series objects.
ser1 = pd.Series([5, 3, 1, 6],
index = ['A', 'B', 'C', 'D'])
ser2 = pd.Series([6, 8, 2, 6],
index = ['A', 'B', 'C', 'D'])
# Here's ser1.
# And here's ser2.
# An example of a binary operation is multiplication. Let's check it out.
ser1 * ser2
This works as expected, the indices are aligned. But what if the indices in the two Series objects overlap only partially?
# Let's create two other Series objects so that the indices are slightly different.
ser3 = pd.Series([5, 3, 1, 6],
index = ['A', 'B', 'C', 'D'])
ser4 = pd.Series([6, 8, 2, 6],
index = ['B', 'D', 'F', 'G'])
# Now let's perfom another common binary operation, addition.
ser3 + ser4
As you can see, this time we have the union of the indices. The indices B and D, which you can find in both objects, have been aligned. For the other indices we get the Nan value, which stands for ‘Not a Number’ and is how pandas marks missing data.
And now let’s see how binary operations work with DataFrames.
# Again, let's start by creating two DataFrame objects with overlapping indices and column names.
df1 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
columns = ['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randint(2, 20, (3, 4)),
columns = ['A', 'B', 'C', 'D'])
# Here's df1
# And here's df2
# The time let's use the true division binary operation.
df1 / df2
So, again the indices and column labels are aligned. What about DataFrames where only some of the indices and column names overlap?
# Let's define two such DataFrames.
df3 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
columns = ['A', 'B', 'C', 'D'])
df4 = pd.DataFrame(np.random.randint(2, 20, (4, 3)),
columns = ['A', 'B', 'E'])
# Here's df3.
# And here's df4.
# And now let's perform the subtraction binary operation on them.
df3 - df4
Again, we have the union of indices and column labels and NaN values for any missing data.
Finally, let’s see how operations between Series objects and Dataframe objects work.
# Let's create a Series object and a DataFrame object to work on.
ser5 = pd.Series([2, 4, 7, 8],
index = ['A', 'B', 'C', 'D'])
df5 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
columns = ['A', 'B', 'C', 'D'])
# Here's the Series.
# And here's the DataFrame.
# Let's see how the subtraction operation works.
ser5 - df5
# Naturally, with subtraction the order matters.
df5 - ser5
As you can see, the result is a DataFrame and the operation is performed row by row. And now let’s use objects where the indices only partially overlap.
# Let's define another Series and another DataFrame.
ser6 = pd.Series([2, 4, 7, 8],
index = ['A', 'B', 'C', 'D'])
df6 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
columns = ['A', 'C', 'D', 'E'])
# Here's ser6.
# And here's df6.
# Let's subtract the DataFrame from the Series now:
ser6 - df6
# And the other way around:
df6 - ser6
So, as expected, the missing data is replaced by NaN values and the indices and column names are aligned.
Here’s the video version of the article: