Skip to content
Home » pandas Part 17 – Named MultiIndex Levels

pandas Part 17 – Named MultiIndex Levels

Spread the love

NAMED MULTIINDEX LEVELS

Sometimes it’s useful to name the levels of an MultiIndex object. Let’s have a look at the example from the previous part of the series:

In [1]:
import numpy as np
import pandas as pd

# the MultiIndex
a = pd.MultiIndex.from_tuples([('Asia', 2019), ('Asia', 2020), ('Australia', 2019), ('Australia', 2020)])

# the data
data = [100, 200,
        150, 250]

# the Series object 
s = pd.Series(data, index=a)
s
Out[1]:
Asia       2019    100
           2020    200
Australia  2019    150
           2020    250
dtype: int64

Although in this example it’s pretty obvious what the two levels are, it’s not always the case. As this Series object already exists, we can use the names attribute to set the names of the levels:

In [2]:
s.index.names = ['continent', 'year']
s
Out[2]:
continent  year
Asia       2019    100
           2020    200
Australia  2019    150
           2020    250
dtype: int64

Now it’s even more obvious. You can also name the levels when you create a MultiIndex object. You just have to pass the names argument to the constructor. Let’s create a new MultiIndex to demonstrate this:

In [3]:
# We can use any method. In this example we'll use the from_product method.
p = pd.MultiIndex.from_product([['wild horses', 'boars', 'wolves'], ['Europe', 'Asia', 'North America']], 
                               names=['species', 'region'])
p
Out[3]:
MultiIndex([('wild horses',        'Europe'),
            ('wild horses',          'Asia'),
            ('wild horses', 'North America'),
            (      'boars',        'Europe'),
            (      'boars',          'Asia'),
            (      'boars', 'North America'),
            (     'wolves',        'Europe'),
            (     'wolves',          'Asia'),
            (     'wolves', 'North America')],
           names=['species', 'region'])
In [9]:
# Now let's use the index to create a multi-indexed Series object.
populations = np.array([1200, 2500, 850,
                        4900, 5400, 3600,
                        2100, 1900, 940])

# The populations above are given in thousands of individuals, so let's multiply
# each value by 1000 so that the actual data is displayed.
populations *= 1000

# Here's the Series object.
populations_by_region = pd.Series(populations, index=p)
populations_by_region
Out[9]:
species      region       
wild horses  Europe           1200000
             Asia             2500000
             North America     850000
boars        Europe           4900000
             Asia             5400000
             North America    3600000
wolves       Europe           2100000
             Asia             1900000
             North America     940000
dtype: int32

That’s it. As you can see, named levels make it easier for us to know what data is represented by each of them. In the next part of the series we’ll be talking about multi-indexed DataFrame columns.

Your Panda3D Magazine

Make Awesome Games and Other 3D Apps

with Panda3D and Blender using Python.

Cool stuff, easy to follow articles.

Get the magazine here (PDF).

Python Jumpstart Course

Learn the basics of Python, including OOP.

with lots of exercises, easy to follow

The course is available on Udemy.

Blender Jumpstart Course

Learn the basics of 3D modeling in Blender.

step-by-step, easy to follow, visually rich

The course is available on Udemy and on Skillshare.


Spread the love

Leave a Reply