NAMED MULTIINDEX LEVELS
Sometimes it’s useful to name the levels of an MultiIndex object. Let’s have a look at the example from the previous part of the series:
import numpy as np
import pandas as pd
# the MultiIndex
a = pd.MultiIndex.from_tuples([('Asia', 2019), ('Asia', 2020), ('Australia', 2019), ('Australia', 2020)])
# the data
data = [100, 200,
150, 250]
# the Series object
s = pd.Series(data, index=a)
s
Although in this example it’s pretty obvious what the two levels are, it’s not always the case. As this Series object already exists, we can use the names attribute to set the names of the levels:
s.index.names = ['continent', 'year']
s
Now it’s even more obvious. You can also name the levels when you create a MultiIndex object. You just have to pass the names argument to the constructor. Let’s create a new MultiIndex to demonstrate this:
# We can use any method. In this example we'll use the from_product method.
p = pd.MultiIndex.from_product([['wild horses', 'boars', 'wolves'], ['Europe', 'Asia', 'North America']],
names=['species', 'region'])
p
# Now let's use the index to create a multi-indexed Series object.
populations = np.array([1200, 2500, 850,
4900, 5400, 3600,
2100, 1900, 940])
# The populations above are given in thousands of individuals, so let's multiply
# each value by 1000 so that the actual data is displayed.
populations *= 1000
# Here's the Series object.
populations_by_region = pd.Series(populations, index=p)
populations_by_region
That’s it. As you can see, named levels make it easier for us to know what data is represented by each of them. In the next part of the series we’ll be talking about multi-indexed DataFrame columns.