Skip to content
Home » pandas Part 21 – Sorting MultiIndices

pandas Part 21 – Sorting MultiIndices

Spread the love

SORTING MULTIINDICES

Sorting multi-indices is important because many slicing operations only work on sorted data. Let’s create a multi-indexed Series object with indices that are not sorted and then try to slice it. Here’s the Series object with random data:

In [10]:
import numpy as np
import pandas as pd

index = pd.MultiIndex.from_product([['Columbus', 'Seattle', 'Denver', 'Dallas'], [2019, 2020]])
data = pd.Series(np.random.rand(8), index=index)
data.index.names = ['city', 'year']
data
Out[10]:
city      year
Columbus  2019    0.374872
          2020    0.996970
Seattle   2019    0.614238
          2020    0.597602
Denver    2019    0.997905
          2020    0.006256
Dallas    2019    0.660692
          2020    0.493068
dtype: float64

As you can see the cities are not sorted. Let’s see what happens if we try to slice the object:

In [11]:
data['Columbus':'Denver']
---------------------------------------------------------------------------
UnsortedIndexError                        Traceback (most recent call last)
<ipython-input-11-e98b33f60046> in <module>
----> 1 data['Columbus':'Denver']

~\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
   1108             key = check_bool_indexer(self.index, key)
   1109 
-> 1110         return self._get_with(key)
   1111 
   1112     def _get_with(self, key):

~\Anaconda3\lib\site-packages\pandas\core\series.py in _get_with(self, key)
   1113         # other: fancy integer or otherwise
   1114         if isinstance(key, slice):
-> 1115             indexer = self.index._convert_slice_indexer(key, kind="getitem")
   1116             return self._get_values(indexer)
   1117         elif isinstance(key, ABCDataFrame):

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in _convert_slice_indexer(self, key, kind)
   3214         else:
   3215             try:
-> 3216                 indexer = self.slice_indexer(start, stop, step, kind=kind)
   3217             except Exception:
   3218                 if is_index_slice:

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in slice_indexer(self, start, end, step, kind)
   5032         slice(1, 3)
   5033         """
-> 5034         start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)
   5035 
   5036         # return a slice

~\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py in slice_locs(self, start, end, step, kind)
   2579         # This function adds nothing to its parent implementation (the magic
   2580         # happens in get_slice_bound method), but it adds meaningful doc.
-> 2581         return super().slice_locs(start, end, step, kind=kind)
   2582 
   2583     def _partial_tup_index(self, tup, side="left"):

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in slice_locs(self, start, end, step, kind)
   5246         start_slice = None
   5247         if start is not None:
-> 5248             start_slice = self.get_slice_bound(start, "left", kind)
   5249         if start_slice is None:
   5250             start_slice = 0

~\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py in get_slice_bound(self, label, side, kind)
   2523         if not isinstance(label, tuple):
   2524             label = (label,)
-> 2525         return self._partial_tup_index(label, side=side)
   2526 
   2527     def slice_locs(self, start=None, end=None, step=None, kind=None):

~\Anaconda3\lib\site-packages\pandas\core\indexes\multi.py in _partial_tup_index(self, tup, side)
   2585             raise UnsortedIndexError(
   2586                 "Key length (%d) was greater than MultiIndex"
-> 2587                 " lexsort depth (%d)" % (len(tup), self.lexsort_depth)
   2588             )
   2589 

UnsortedIndexError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'

Yes, that’s right. You get an UnsortedIndexError. So, let’s sort the indices. To do that we’ll use the sort_index method:

In [12]:
data = data.sort_index()
data
Out[12]:
city      year
Columbus  2019    0.374872
          2020    0.996970
Dallas    2019    0.660692
          2020    0.493068
Denver    2019    0.997905
          2020    0.006256
Seattle   2019    0.614238
          2020    0.597602
dtype: float64

Now the indices are sorted and slicing will work:

In [13]:
data['Columbus':'Denver']
Out[13]:
city      year
Columbus  2019    0.374872
          2020    0.996970
Dallas    2019    0.660692
          2020    0.493068
Denver    2019    0.997905
          2020    0.006256
dtype: float64

Your Panda3D Magazine

Make Awesome Games and Other 3D Apps

with Panda3D and Blender using Python.

Cool stuff, easy to follow articles.

Get the magazine here (PDF).

Python Jumpstart Course

Learn the basics of Python, including OOP.

with lots of exercises, easy to follow

The course is available on Udemy.

Blender Jumpstart Course

Learn the basics of 3D modeling in Blender.

step-by-step, easy to follow, visually rich

The course is available on Udemy and on Skillshare.


Spread the love

Leave a Reply