DATA AGGREGATION ON MULTIINDICES
We can aggregate hierarchically indexed data using common aggragation methods like mean, sum, min or max. Let’s demonstrate it on the following multi-indexed DataFrame that we used in one of the preceding articles in the series:
In [1]:
import numpy as np
import pandas as pd
rows = pd.MultiIndex.from_product([[2017, 2018, 2019], ['Jan-Jun', 'Jul-Dec']],
names=['year', 'period'])
columns = pd.MultiIndex.from_product([['Company A', 'Company B', 'Company C'], ['sales', 'purchases']],
names=['company', 'total value'])
data = np.random.randint(20000, 100000, (6, 6))
a = pd.DataFrame(data, index=rows, columns=columns)
a
Out[1]:
Here are some examples. First let’s find the minimum sales and purchases for each year:
In [2]:
a_min = a.min(level='year')
a_min
Out[2]:
And now the maximum purchases and sales:
In [3]:
a_max = a.max(level='year')
a_max
Out[3]:
Next, let’s find the mean purchases and sales for each year:
In [4]:
a_mean = a.mean(level='year')
a_mean
Out[4]:
And finally, let’s sum the purchases and sales for each year:
In [5]:
a_sum = a.sum(level='year')
a_sum
Out[5]:
You can also use the axis keyword if you want to use the methods on the columns too. For example, let’s find the minimum sales and purchases among all the three companies:
In [8]:
a_all_min = a_min.min(axis=1, level='total value')
a_all_min
Out[8]: