Spread the love

Pandas makes it easy to operate on data. In case of unary operations index and column labels are preserved. In case of binary operations indices are aligned.

UNARY OPERATIONS

Let’s have a look at a unary operation first:

In [2]:

import numpy as np
import pandas as pd

# Let's create a Series object and a DataFrame object to work on.
numSeries = pd.Series([3, 6, 5, 10],
                     index = ['A', 'B', 'C', 'D'])

numDataFrame = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
                           columns = ['A', 'B', 'C', 'D'])

# Let's print out the Series...
numSeries

Out[2]:

A     3
B     6
C     5
D    10
dtype: int64

In [3]:

# and the DataFrame.
numDataFrame

Out[3]:

	A	B	C	D
0	2	4	9	7
1	2	4	9	3
2	3	7	5	9

In [6]:

# Now let's apply the unary sin function to the Series.
np.sin(numSeries * np.pi)

Out[6]:

A    3.673940e-16
B   -7.347881e-16
C    6.123234e-16
D   -1.224647e-15
dtype: float64

In [7]:

# And let's do the same to the DataFrame.
np.sin(numDataFrame * np.pi)

Out[7]:

	A	B	C	D
0	-2.449294e-16	-4.898587e-16	1.102182e-15	8.572528e-16
1	-2.449294e-16	-4.898587e-16	1.102182e-15	3.673940e-16
2	3.673940e-16	8.572528e-16	6.123234e-16	1.102182e-15

As you can see, in both the Series and the DataFrame the indices have been preserved. So have the column labels in the DataFrame.

BINARY OPERATIONS

And now let’s have a look at binary operations between two Series objects and two DataFrame objects.

In [9]:

# First let's create two Series objects.
ser1 = pd.Series([5, 3, 1, 6],
                index = ['A', 'B', 'C', 'D'])
ser2 = pd.Series([6, 8, 2, 6],
                index = ['A', 'B', 'C', 'D'])

# Here's ser1.
ser1

Out[9]:

A    5
B    3
C    1
D    6
dtype: int64

In [10]:

# And here's ser2.
ser2

Out[10]:

A    6
B    8
C    2
D    6
dtype: int64

In [11]:

# An example of a binary operation is multiplication. Let's check it out.
ser1 * ser2

Out[11]:

A    30
B    24
C     2
D    36
dtype: int64

This works as expected, the indices are aligned. But what if the indices in the two Series objects overlap only partially?

In [12]:

# Let's create two other Series objects so that the indices are slightly different.
ser3 = pd.Series([5, 3, 1, 6],
                index = ['A', 'B', 'C', 'D'])
ser4 = pd.Series([6, 8, 2, 6],
                index = ['B', 'D', 'F', 'G'])

# Now let's perfom another common binary operation, addition.
ser3 + ser4

Out[12]:

A     NaN
B     9.0
C     NaN
D    14.0
F     NaN
G     NaN
dtype: float64

As you can see, this time we have the union of the indices. The indices B and D, which you can find in both objects, have been aligned. For the other indices we get the Nan value, which stands for ‘Not a Number’ and is how pandas marks missing data.

And now let’s see how binary operations work with DataFrames.

In [13]:

# Again, let's start by creating two DataFrame objects with overlapping indices and column names.
df1 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
                  columns = ['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randint(2, 20, (3, 4)),
                  columns = ['A', 'B', 'C', 'D'])

# Here's df1
df1

Out[13]:

	A	B	C	D
0	2	3	2	1
1	9	4	1	3
2	8	1	6	9

In [14]:

# And here's df2
df2

Out[14]:

	A	B	C	D
0	13	14	17	18
1	7	8	4	13
2	6	3	11	9

In [15]:

# The time let's use the true division binary operation.
df1 / df2

Out[15]:

	A	B	C	D
0	0.153846	0.214286	0.117647	0.055556
1	1.285714	0.500000	0.250000	0.230769
2	1.333333	0.333333	0.545455	1.000000

So, again the indices and column labels are aligned. What about DataFrames where only some of the indices and column names overlap?

In [16]:

# Let's define two such DataFrames.
df3 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
                  columns = ['A', 'B', 'C', 'D'])
df4 = pd.DataFrame(np.random.randint(2, 20, (4, 3)),
                  columns = ['A', 'B', 'E'])

# Here's df3.
df3

Out[16]:

	A	B	C	D
0	7	4	6	6
1	1	5	6	4
2	9	3	4	7

In [17]:

# And here's df4.
df4

Out[17]:

	A	B	E
0	5	4	5
1	6	17	10
2	16	6	13
3	18	9	6

In [18]:

# And now let's perform the subtraction binary operation on them.
df3 - df4

Out[18]:

	A	B	C	D	E
0	2.0	0.0	NaN	NaN	NaN
1	-5.0	-12.0	NaN	NaN	NaN
2	-7.0	-3.0	NaN	NaN	NaN
3	NaN	NaN	NaN	NaN	NaN

Again, we have the union of indices and column labels and NaN values for any missing data.

OPERATIONS BETWEEN SERIES AND DATAFRAME OBJECTS

Finally, let’s see how operations between Series objects and Dataframe objects work.

In [19]:

# Let's create a Series object and a DataFrame object to work on.
ser5 = pd.Series([2, 4, 7, 8],
                index = ['A', 'B', 'C', 'D'])
df5 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
                  columns = ['A', 'B', 'C', 'D'])

# Here's the Series.
ser5

Out[19]:

A    2
B    4
C    7
D    8
dtype: int64

In [20]:

# And here's the DataFrame.
df5

Out[20]:

	A	B	C	D
0	9	3	6	1
1	9	3	9	1
2	8	1	5	9

In [21]:

# Let's see how the subtraction operation works.
ser5 - df5

Out[21]:

	A	B	C	D
0	-7	1	1	7
1	-7	1	-2	7
2	-6	3	2	-1

In [22]:

# Naturally, with subtraction the order matters.
df5 - ser5

Out[22]:

	A	B	C	D
0	7	-1	-1	-7
1	7	-1	2	-7
2	6	-3	-2	1

As you can see, the result is a DataFrame and the operation is performed row by row. And now let’s use objects where the indices only partially overlap.

In [23]:

# Let's define another Series and another DataFrame.
ser6 = pd.Series([2, 4, 7, 8],
                index = ['A', 'B', 'C', 'D'])
df6 = pd.DataFrame(np.random.randint(1, 10, (3, 4)),
                  columns = ['A', 'C', 'D', 'E'])

# Here's ser6.
ser6

Out[23]:

A    2
B    4
C    7
D    8
dtype: int64

In [24]:

# And here's df6.
df6

Out[24]:

	A	C	D	E
0	5	3	1	6
1	4	4	7	5
2	8	3	9	5

In [25]:

# Let's subtract the DataFrame from the Series now:
ser6 - df6

Out[25]:

	A	B	C	D	E
0	-3.0	NaN	4.0	7.0	NaN
1	-2.0	NaN	3.0	1.0	NaN
2	-6.0	NaN	4.0	-1.0	NaN

In [26]:

# And the other way around:
df6 - ser6

Out[26]:

	A	B	C	D	E
0	3.0	NaN	-4.0	-7.0	NaN
1	2.0	NaN	-3.0	-1.0	NaN
2	6.0	NaN	-4.0	1.0	NaN

So, as expected, the missing data is replaced by NaN values and the indices and column names are aligned.

Learn how to make beautiful GUI apps

in Python using the Kivy framework.

Comprehensive, for Kivy beginners, easy to follow.

Get the book here (PDF) or on Amazon:

ebook / paperback (black and white) / paperback (full color)

Your Panda3D Magazine

Make Awesome Games and Other 3D Apps

with Panda3D and Blender using Python.

Cool stuff, easy to follow articles.

Get the magazine here (PDF).

Python Jumpstart Course

Learn the basics of Python, including OOP.

with lots of exercises, easy to follow

The course is available on Udemy.

Blender Jumpstart Course

Learn the basics of 3D modeling in Blender.

step-by-step, easy to follow, visually rich

The course is available on Udemy and on Skillshare.

Here’s the video version of the article:

Spread the love

pandas Part 10 – Operations on Data

Like this:

Leave a ReplyCancel reply

pandas Part 10 – Operations on Data

Share this:

Like this:

Leave a ReplyCancel reply