Skip to content
Home » pandas Part 20 – Indexing and Slicing Multiply Indexed DataFrames

pandas Part 20 – Indexing and Slicing Multiply Indexed DataFrames

Spread the love

INDEXING AND SLICING MULTIPLY INDEXED DATAFRAMES

In the previous part of the Pandas series we were talking about indexing and slicing multiply indexed Series objects. Today we’ll be talking about multiply indexed DataFrames. They behave very much like Series objects as far as indexing and slicing is concerned. Here’s the example from one of the previous parts that we’ll be working on:

In [1]:
import numpy as np
import pandas as pd

rows = pd.MultiIndex.from_product([[2017, 2018, 2019], ['Jan-Jun', 'Jul-Dec']],
                                  names=['year', 'period'])

columns = pd.MultiIndex.from_product([['Company A', 'Company B', 'Company C'], ['sales', 'purchases']],
                                  names=['company', 'total value'])

data = np.random.randint(20000, 100000, (6, 6))
a = pd.DataFrame(data, index=rows, columns=columns)
a
Out[1]:
company Company A Company B Company C
total value sales purchases sales purchases sales purchases
year period
2017 Jan-Jun 52016 75802 46574 35849 37107 89150
Jul-Dec 46388 79543 47696 86961 32138 64179
2018 Jan-Jun 94779 90781 64409 62316 49368 51174
Jul-Dec 21693 68970 79971 56702 52753 97088
2019 Jan-Jun 49221 60597 36683 47390 83167 30133
Jul-Dec 29882 80608 76448 27891 94124 43042

Here are some examples of indexing and slicing:

In [2]:
# Let's access all data relevant to Company C.
a['Company C']
Out[2]:
total value sales purchases
year period
2017 Jan-Jun 37107 89150
Jul-Dec 32138 64179
2018 Jan-Jun 49368 51174
Jul-Dec 52753 97088
2019 Jan-Jun 83167 30133
Jul-Dec 94124 43042
In [3]:
# And now we need just the sales data of Company A.
a['Company A', 'sales']
Out[3]:
year  period 
2017  Jan-Jun    52016
      Jul-Dec    46388
2018  Jan-Jun    94779
      Jul-Dec    21693
2019  Jan-Jun    49221
      Jul-Dec    29882
Name: (Company A, sales), dtype: int32
In [4]:
# We can also use the loc and iloc indexers, the former for explicit label indices and the latter for 
# implicit positional integer indices. 
a.iloc[:2, :2]
Out[4]:
company Company A
total value sales purchases
year period
2017 Jan-Jun 52016 75802
Jul-Dec 46388 79543
In [5]:
# You can also pass a tuple of multiple indices.
a.loc[:, ('Company B', 'purchases')]
Out[5]:
year  period 
2017  Jan-Jun    35849
      Jul-Dec    86961
2018  Jan-Jun    62316
      Jul-Dec    56702
2019  Jan-Jun    47390
      Jul-Dec    27891
Name: (Company B, purchases), dtype: int32

As far as slicing is concerned, the optimal way of doing this is by means of the Pandas IndexSlice object. Have a look:

In [8]:
idx = pd.IndexSlice
a.loc[idx[:, 'Jan-Jun'], idx[:, 'sales']]
Out[8]:
company Company A Company B Company C
total value sales sales sales
year period
2017 Jan-Jun 52016 46574 37107
2018 Jan-Jun 94779 64409 49368
2019 Jan-Jun 49221 36683 83167

Your Panda3D Magazine

Make Awesome Games and Other 3D Apps

with Panda3D and Blender using Python.

Cool stuff, easy to follow articles.

Get the magazine here (PDF).

Python Jumpstart Course

Learn the basics of Python, including OOP.

with lots of exercises, easy to follow

The course is available on Udemy.

Blender Jumpstart Course

Learn the basics of 3D modeling in Blender.

step-by-step, easy to follow, visually rich

The course is available on Udemy and on Skillshare.


Spread the love
Tags:

Leave a Reply