MULTI-INDEXED DATAFRAME COLUMNS
Up to now we’ve been using multi-level indexing for rows, but it also works with columns. In this part we’ll create a four-dimensional DataFrame with two levels of indexing for the rows and two levels of indexing for the columns.
The DataFrame is supposed to represent the total values of sales and purchases of three companies in 2017, 2018 and 2019, in two six-month periods for each year. We’re going to use some mock integer data in the example. Have a look:
In [2]:
import numpy as np
import pandas as pd
# Here's the MultiIndex for the rows.
rows = pd.MultiIndex.from_product([[2017, 2018, 2019], ['Jan-Jun', 'Jul-Dec']],
names=['year', 'period'])
# Here's the MultiIndex for the columns.
columns = pd.MultiIndex.from_product([['Company A', 'Company B', 'Company C'], ['sales', 'purchases']],
names=['company', 'total value'])
# some mock data
data = np.random.randint(20000, 100000, (6, 6))
# and the DataFrame itself
a = pd.DataFrame(data, index=rows, columns=columns)
a
Out[2]:
You can now index the DataFrame to access the data that you need. Here are some examples:
In [3]:
# sales and puchases for Company B
a['Company B']
Out[3]:
In [5]:
# just the sales for Company C
a['Company C']['sales']
Out[5]: