Pandas makes it easy to operate on data. In case of unary operations index and column labels are preserved. In case of binary operations indices are aligned.
UNARY OPERATIONS
Let’s have a look at a unary operation first:
import numpy as np
import pandas as pd
# Let's create a Series object and a DataFrame object to work on.
numSeries = pd.Series([3, 6, 5, 10],
index = ['A', 'B', 'C', 'D'])
numDataFrame = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
columns = ['A', 'B', 'C', 'D'])
# Let's print out the Series...
numSeries
# and the DataFrame.
numDataFrame
# Now let's apply the unary sin function to the Series.
np.sin(numSeries * np.pi)
# And let's do the same to the DataFrame.
np.sin(numDataFrame * np.pi)
As you can see, in both the Series and the DataFrame the indices have been preserved. So have the column labels in the DataFrame.
BINARY OPERATIONS
And now let’s have a look at binary operations between two Series objects and two DataFrame objects.
# First let's create two Series objects.
ser1 = pd.Series([5, 3, 1, 6],
index = ['A', 'B', 'C', 'D'])
ser2 = pd.Series([6, 8, 2, 6],
index = ['A', 'B', 'C', 'D'])
# Here's ser1.
ser1
# And here's ser2.
ser2
# An example of a binary operation is multiplication. Let's check it out.
ser1 * ser2
This works as expected, the indices are aligned. But what if the indices in the two Series objects overlap only partially?
# Let's create two other Series objects so that the indices are slightly different.
ser3 = pd.Series([5, 3, 1, 6],
index = ['A', 'B', 'C', 'D'])
ser4 = pd.Series([6, 8, 2, 6],
index = ['B', 'D', 'F', 'G'])
# Now let's perfom another common binary operation, addition.
ser3 + ser4
As you can see, this time we have the union of the indices. The indices B and D, which you can find in both objects, have been aligned. For the other indices we get the Nan value, which stands for ‘Not a Number’ and is how pandas marks missing data.
And now let’s see how binary operations work with DataFrames.
# Again, let's start by creating two DataFrame objects with overlapping indices and column names.
df1 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
columns = ['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randint(2, 20, (3, 4)),
columns = ['A', 'B', 'C', 'D'])
# Here's df1
df1
# And here's df2
df2
# The time let's use the true division binary operation.
df1 / df2
So, again the indices and column labels are aligned. What about DataFrames where only some of the indices and column names overlap?
# Let's define two such DataFrames.
df3 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
columns = ['A', 'B', 'C', 'D'])
df4 = pd.DataFrame(np.random.randint(2, 20, (4, 3)),
columns = ['A', 'B', 'E'])
# Here's df3.
df3
# And here's df4.
df4
# And now let's perform the subtraction binary operation on them.
df3 - df4
Again, we have the union of indices and column labels and NaN values for any missing data.
OPERATIONS BETWEEN SERIES AND DATAFRAME OBJECTS
Finally, let’s see how operations between Series objects and Dataframe objects work.
# Let's create a Series object and a DataFrame object to work on.
ser5 = pd.Series([2, 4, 7, 8],
index = ['A', 'B', 'C', 'D'])
df5 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
columns = ['A', 'B', 'C', 'D'])
# Here's the Series.
ser5
# And here's the DataFrame.
df5
# Let's see how the subtraction operation works.
ser5 - df5
# Naturally, with subtraction the order matters.
df5 - ser5
As you can see, the result is a DataFrame and the operation is performed row by row. And now let’s use objects where the indices only partially overlap.
# Let's define another Series and another DataFrame.
ser6 = pd.Series([2, 4, 7, 8],
index = ['A', 'B', 'C', 'D'])
df6 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
columns = ['A', 'C', 'D', 'E'])
# Here's ser6.
ser6
# And here's df6.
df6
# Let's subtract the DataFrame from the Series now:
ser6 - df6
# And the other way around:
df6 - ser6
So, as expected, the missing data is replaced by NaN values and the indices and column names are aligned.
Here’s the video version of the article: