Skip to content
Home » pandas Part 23 – Data Aggragation on MultiIndices

pandas Part 23 – Data Aggragation on MultiIndices

Spread the love

DATA AGGREGATION ON MULTIINDICES

We can aggregate hierarchically indexed data using common aggragation methods like mean, sum, min or max. Let’s demonstrate it on the following multi-indexed DataFrame that we used in one of the preceding articles in the series:

In [1]:
import numpy as np
import pandas as pd

rows = pd.MultiIndex.from_product([[2017, 2018, 2019], ['Jan-Jun', 'Jul-Dec']],
                                  names=['year', 'period'])

columns = pd.MultiIndex.from_product([['Company A', 'Company B', 'Company C'], ['sales', 'purchases']],
                                  names=['company', 'total value'])

data = np.random.randint(20000, 100000, (6, 6))
a = pd.DataFrame(data, index=rows, columns=columns)
a
Out[1]:
company Company A Company B Company C
total value sales purchases sales purchases sales purchases
year period
2017 Jan-Jun 93991 83742 53339 64714 87043 56829
Jul-Dec 89819 88190 52651 62935 81435 84043
2018 Jan-Jun 40709 46692 37939 33882 93266 57254
Jul-Dec 41856 52878 99839 64401 65748 40280
2019 Jan-Jun 86673 43148 70669 81171 88066 97462
Jul-Dec 21522 24968 49673 49746 84734 45877

Here are some examples. First let’s find the minimum sales and purchases for each year:

In [2]:
a_min = a.min(level='year')
a_min
Out[2]:
company Company A Company B Company C
total value sales purchases sales purchases sales purchases
year
2017 89819 83742 52651 62935 81435 56829
2018 40709 46692 37939 33882 65748 40280
2019 21522 24968 49673 49746 84734 45877

And now the maximum purchases and sales:

In [3]:
a_max = a.max(level='year')
a_max
Out[3]:
company Company A Company B Company C
total value sales purchases sales purchases sales purchases
year
2017 93991 88190 53339 64714 87043 84043
2018 41856 52878 99839 64401 93266 57254
2019 86673 43148 70669 81171 88066 97462

Next, let’s find the mean purchases and sales for each year:

In [4]:
a_mean = a.mean(level='year')
a_mean
Out[4]:
company Company A Company B Company C
total value sales purchases sales purchases sales purchases
year
2017 91905.0 85966.0 52995.0 63824.5 84239.0 70436.0
2018 41282.5 49785.0 68889.0 49141.5 79507.0 48767.0
2019 54097.5 34058.0 60171.0 65458.5 86400.0 71669.5

And finally, let’s sum the purchases and sales for each year:

In [5]:
a_sum = a.sum(level='year')
a_sum
Out[5]:
company Company A Company B Company C
total value sales purchases sales purchases sales purchases
year
2017 183810 171932 105990 127649 168478 140872
2018 82565 99570 137778 98283 159014 97534
2019 108195 68116 120342 130917 172800 143339

You can also use the axis keyword if you want to use the methods on the columns too. For example, let’s find the minimum sales and purchases among all the three companies:

In [8]:
a_all_min = a_min.min(axis=1, level='total value')
a_all_min
Out[8]:
total value sales purchases
year
2017 52651 56829
2018 37939 33882
2019 21522 24968

Your Panda3D Magazine

Make Awesome Games and Other 3D Apps

with Panda3D and Blender using Python.

Cool stuff, easy to follow articles.

Get the magazine here (PDF).

Python Jumpstart Course

Learn the basics of Python, including OOP.

with lots of exercises, easy to follow

The course is available on Udemy.

Blender Jumpstart Course

Learn the basics of 3D modeling in Blender.

step-by-step, easy to follow, visually rich

The course is available on Udemy and on Skillshare.


Spread the love
Tags:

Leave a Reply