USING PANDAS MULTIINDEX FOR MULTIDIMENSIONAL DATA
We can use the MultiIndex for multidimensional data. Here’s our example from the previous part of the series:
In [2]:
import numpy as np
import pandas as pd
# the tuples used as indices
index = [('Company A', 2018), ('Company A', 2019),
('Company B', 2018), ('Company B', 2019),
('Company C', 2018), ('Company C', 2019)]
# the list of sales
sales = [125, 211,
390, 455,
475, 655]
# Let's create a Series object.
s = pd.Series(sales, index=index)
# Here's the multi-index.
m_index = pd.MultiIndex.from_tuples(index)
# Let's reindex the Series object.
s = s.reindex(m_index)
s
Out[2]:
Let’s convert the Series to a DataFrame. Let’s also make clear what sort of data the column contains:
In [3]:
s_df = pd.DataFrame({'sales': s})
s_df
Out[3]:
Now, what if we wanted to add one more dimension to the DataFrame? For example annual purchases? We can do it like so:
In [4]:
# Let's define the purchases as p:
p = [44, 72,
81, 88,
93, 87]
# Let's add the purchases column.
s_df2 = pd.DataFrame({'sales': s,
'purchases': p})
s_df2
Out[4]:
You can also use all the ufuncs with MultiIndex. Let’s calculate the difference between the sales and purchases for example:
In [5]:
b = s_df2['sales'] - s_df2['purchases']
b
Out[5]:
This will even look better if we unstack it:
In [6]:
b.unstack()
Out[6]:
It works the same for higher-dimensional data.
Here’s the video version of this article: