Spread the love

Learn how to make beautiful GUI apps

Comprehensive, for Kivy beginners, easy to follow.

Get the book here (PDF) or on Amazon:

ebook / paperback (black and white) / paperback (full color)

NAMED MULTIINDEX LEVELS

Sometimes it’s useful to name the levels of an MultiIndex object. Let’s have a look at the example from the previous part of the series:

In [1]:

import numpy as np
import pandas as pd

# the MultiIndex
a = pd.MultiIndex.from_tuples([('Asia', 2019), ('Asia', 2020), ('Australia', 2019), ('Australia', 2020)])

# the data
data = [100, 200,
        150, 250]

# the Series object 
s = pd.Series(data, index=a)
s

Out[1]:

Asia       2019    100
           2020    200
Australia  2019    150
           2020    250
dtype: int64

Although in this example it’s pretty obvious what the two levels are, it’s not always the case. As this Series object already exists, we can use the names attribute to set the names of the levels:

In [2]:

s.index.names = ['continent', 'year']
s

Out[2]:

continent  year
Asia       2019    100
           2020    200
Australia  2019    150
           2020    250
dtype: int64

Now it’s even more obvious. You can also name the levels when you create a MultiIndex object. You just have to pass the names argument to the constructor. Let’s create a new MultiIndex to demonstrate this:

In [3]:

# We can use any method. In this example we'll use the from_product method.
p = pd.MultiIndex.from_product([['wild horses', 'boars', 'wolves'], ['Europe', 'Asia', 'North America']], 
                               names=['species', 'region'])
p

Out[3]:

MultiIndex([('wild horses',        'Europe'),
            ('wild horses',          'Asia'),
            ('wild horses', 'North America'),
            (      'boars',        'Europe'),
            (      'boars',          'Asia'),
            (      'boars', 'North America'),
            (     'wolves',        'Europe'),
            (     'wolves',          'Asia'),
            (     'wolves', 'North America')],
           names=['species', 'region'])

In [9]:

# Now let's use the index to create a multi-indexed Series object.
populations = np.array([1200, 2500, 850,
                        4900, 5400, 3600,
                        2100, 1900, 940])

# The populations above are given in thousands of individuals, so let's multiply
# each value by 1000 so that the actual data is displayed.
populations *= 1000

# Here's the Series object.
populations_by_region = pd.Series(populations, index=p)
populations_by_region

Out[9]:

species      region       
wild horses  Europe           1200000
             Asia             2500000
             North America     850000
boars        Europe           4900000
             Asia             5400000
             North America    3600000
wolves       Europe           2100000
             Asia             1900000
             North America     940000
dtype: int32

That’s it. As you can see, named levels make it easier for us to know what data is represented by each of them. In the next part of the series we’ll be talking about multi-indexed DataFrame columns.