Skip to content
Home » pandas Part 10 – Operations on Data

pandas Part 10 – Operations on Data

Spread the love

Pandas makes it easy to operate on data. In case of unary operations index and column labels are preserved. In case of binary operations indices are aligned.

UNARY OPERATIONS

Let’s have a look at a unary operation first:

In [2]:
import numpy as np
import pandas as pd

# Let's create a Series object and a DataFrame object to work on.
numSeries = pd.Series([3, 6, 5, 10],
                     index = ['A', 'B', 'C', 'D'])

numDataFrame = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
                           columns = ['A', 'B', 'C', 'D'])

# Let's print out the Series...
numSeries
Out[2]:
A     3
B     6
C     5
D    10
dtype: int64
In [3]:
# and the DataFrame.
numDataFrame
Out[3]:
A B C D
0 2 4 9 7
1 2 4 9 3
2 3 7 5 9
In [6]:
# Now let's apply the unary sin function to the Series.
np.sin(numSeries * np.pi)
Out[6]:
A    3.673940e-16
B   -7.347881e-16
C    6.123234e-16
D   -1.224647e-15
dtype: float64
In [7]:
# And let's do the same to the DataFrame.
np.sin(numDataFrame * np.pi)
Out[7]:
A B C D
0 -2.449294e-16 -4.898587e-16 1.102182e-15 8.572528e-16
1 -2.449294e-16 -4.898587e-16 1.102182e-15 3.673940e-16
2 3.673940e-16 8.572528e-16 6.123234e-16 1.102182e-15

As you can see, in both the Series and the DataFrame the indices have been preserved. So have the column labels in the DataFrame.

BINARY OPERATIONS

And now let’s have a look at binary operations between two Series objects and two DataFrame objects.

In [9]:
# First let's create two Series objects.
ser1 = pd.Series([5, 3, 1, 6],
                index = ['A', 'B', 'C', 'D'])
ser2 = pd.Series([6, 8, 2, 6],
                index = ['A', 'B', 'C', 'D'])

# Here's ser1.
ser1
Out[9]:
A    5
B    3
C    1
D    6
dtype: int64
In [10]:
# And here's ser2.
ser2
Out[10]:
A    6
B    8
C    2
D    6
dtype: int64
In [11]:
# An example of a binary operation is multiplication. Let's check it out.
ser1 * ser2
Out[11]:
A    30
B    24
C     2
D    36
dtype: int64

This works as expected, the indices are aligned. But what if the indices in the two Series objects overlap only partially?

In [12]:
# Let's create two other Series objects so that the indices are slightly different.
ser3 = pd.Series([5, 3, 1, 6],
                index = ['A', 'B', 'C', 'D'])
ser4 = pd.Series([6, 8, 2, 6],
                index = ['B', 'D', 'F', 'G'])

# Now let's perfom another common binary operation, addition.
ser3 + ser4
Out[12]:
A     NaN
B     9.0
C     NaN
D    14.0
F     NaN
G     NaN
dtype: float64

As you can see, this time we have the union of the indices. The indices B and D, which you can find in both objects, have been aligned. For the other indices we get the Nan value, which stands for ‘Not a Number’ and is how pandas marks missing data.

And now let’s see how binary operations work with DataFrames.

In [13]:
# Again, let's start by creating two DataFrame objects with overlapping indices and column names.
df1 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
                  columns = ['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randint(2, 20, (3, 4)),
                  columns = ['A', 'B', 'C', 'D'])

# Here's df1
df1
Out[13]:
A B C D
0 2 3 2 1
1 9 4 1 3
2 8 1 6 9
In [14]:
# And here's df2
df2
Out[14]:
A B C D
0 13 14 17 18
1 7 8 4 13
2 6 3 11 9
In [15]:
# The time let's use the true division binary operation.
df1 / df2
Out[15]:
A B C D
0 0.153846 0.214286 0.117647 0.055556
1 1.285714 0.500000 0.250000 0.230769
2 1.333333 0.333333 0.545455 1.000000

So, again the indices and column labels are aligned. What about DataFrames where only some of the indices and column names overlap?

In [16]:
# Let's define two such DataFrames.
df3 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
                  columns = ['A', 'B', 'C', 'D'])
df4 = pd.DataFrame(np.random.randint(2, 20, (4, 3)),
                  columns = ['A', 'B', 'E'])

# Here's df3.
df3
Out[16]:
A B C D
0 7 4 6 6
1 1 5 6 4
2 9 3 4 7
In [17]:
# And here's df4.
df4
Out[17]:
A B E
0 5 4 5
1 6 17 10
2 16 6 13
3 18 9 6
In [18]:
# And now let's perform the subtraction binary operation on them.
df3 - df4
Out[18]:
A B C D E
0 2.0 0.0 NaN NaN NaN
1 -5.0 -12.0 NaN NaN NaN
2 -7.0 -3.0 NaN NaN NaN
3 NaN NaN NaN NaN NaN

Again, we have the union of indices and column labels and NaN values for any missing data.

OPERATIONS BETWEEN SERIES AND DATAFRAME OBJECTS

Finally, let’s see how operations between Series objects and Dataframe objects work.

In [19]:
# Let's create a Series object and a DataFrame object to work on.
ser5 = pd.Series([2, 4, 7, 8],
                index = ['A', 'B', 'C', 'D'])
df5 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
                  columns = ['A', 'B', 'C', 'D'])

# Here's the Series.
ser5
Out[19]:
A    2
B    4
C    7
D    8
dtype: int64
In [20]:
# And here's the DataFrame.
df5
Out[20]:
A B C D
0 9 3 6 1
1 9 3 9 1
2 8 1 5 9
In [21]:
# Let's see how the subtraction operation works.
ser5 - df5
Out[21]:
A B C D
0 -7 1 1 7
1 -7 1 -2 7
2 -6 3 2 -1
In [22]:
# Naturally, with subtraction the order matters.
df5 - ser5
Out[22]:
A B C D
0 7 -1 -1 -7
1 7 -1 2 -7
2 6 -3 -2 1

As you can see, the result is a DataFrame and the operation is performed row by row. And now let’s use objects where the indices only partially overlap.

In [23]:
# Let's define another Series and another DataFrame.
ser6 = pd.Series([2, 4, 7, 8],
                index = ['A', 'B', 'C', 'D'])
df6 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
                  columns = ['A', 'C', 'D', 'E'])

# Here's ser6.
ser6
Out[23]:
A    2
B    4
C    7
D    8
dtype: int64
In [24]:
# And here's df6.
df6
Out[24]:
A C D E
0 5 3 1 6
1 4 4 7 5
2 8 3 9 5
In [25]:
# Let's subtract the DataFrame from the Series now:
ser6 - df6
Out[25]:
A B C D E
0 -3.0 NaN 4.0 7.0 NaN
1 -2.0 NaN 3.0 1.0 NaN
2 -6.0 NaN 4.0 -1.0 NaN
In [26]:
# And the other way around:
df6 - ser6
Out[26]:
A B C D E
0 3.0 NaN -4.0 -7.0 NaN
1 2.0 NaN -3.0 -1.0 NaN
2 6.0 NaN -4.0 1.0 NaN

So, as expected, the missing data is replaced by NaN values and the indices and column names are aligned.

Your Panda3D Magazine

Make Awesome Games and Other 3D Apps

with Panda3D and Blender using Python.

Cool stuff, easy to follow articles.

Get the magazine here (PDF).

Python Jumpstart Course

Learn the basics of Python, including OOP.

with lots of exercises, easy to follow

The course is available on Udemy.

Blender Jumpstart Course

Learn the basics of 3D modeling in Blender.

step-by-step, easy to follow, visually rich

The course is available on Udemy and on Skillshare.

Here’s the video version of the article:


Spread the love

Leave a Reply