PANDAS MULTI-INDEX
The two data structures that you know, Series and DataFrame, are basically used for one- and two-dimensional data respectively. But you can also use them with higher-dimensional data. It consists of creating several index levels within a single index. We call it multi-indexing.
Let’s jump right into an example. Suppose we have data concerning the sales by several companies in a city. We have the data for each company for two yearly time periods. We’ll start by creating a list of tuples where each tuple contains the name of the company and one yearly period. Then we’ll create a list with the sales. To keep things simple, the sales will be in thousands, so if there is 450, it means $450,000. Then we’ll make a Series object from the list using the tuples as indices. Have a look:
import numpy as np
import pandas as pd
# the tuples used as indices
index = [('Company A', 2018), ('Company A', 2019),
('Company B', 2018), ('Company B', 2019),
('Company C', 2018), ('Company C', 2019)]
# the list of sales
sales = [125, 211,
390, 455,
475, 655]
# Let's create a Series object.
s = pd.Series(sales, index=index)
# Let's print out the Series object.
s
We’re only halfway there. Although you could use tuple indices like the ones above, it’s not very convenient or efficient. So, let’s move on and create a multi-index. We already havethe tuples, so we can use the from_tuples method:
# Here's the multi-index.
m_index = pd.MultiIndex.from_tuples(index)
# It looks like so:
m_index
A multi-index contains several levels of indexing. In our case there are two: the company names and the time periods. It also contains several codes, which are used to access to elements contained in the levels. Let’s check them out:
# Here are the levels.
m_index.levels
# Here are the codes.
m_index.codes
If you have a look at the codes, you’ll notice that if you take the elements from both FrozenLists in pairs (the ones at the same positions in the lists), you’ll see how the pieces of data in the levels are related to one another. So, for example the first pair is (0, 0), which means the first elements in both levels lists: (‘Company A’, 2018), the next pair is (0, 1), which corresponds to (‘Company A’, 2019). The third pair is (1, 0), which corresponds to (‘Company B’, 2018), and so on.
If you want to see the data in a hierarchical way, you have to reindex your data:
# Let's reindex the Series object.
s = s.reindex(m_index)
s
Now, the missing data in some lines just means the same data is meant as in the line above. In the example above the two first columns belong to the index and the last column to the data.
You can easily access the data using the multi-index. Here an example:
# the 2019 period for all companies
s[:, 2019]
Here’s the video version of this article: