INDEXING AND SLICING MULTIPLY INDEXED DATAFRAMES
In the previous part of the Pandas series we were talking about indexing and slicing multiply indexed Series objects. Today we’ll be talking about multiply indexed DataFrames. They behave very much like Series objects as far as indexing and slicing is concerned. Here’s the example from one of the previous parts that we’ll be working on:
In [1]:
import numpy as np
import pandas as pd
rows = pd.MultiIndex.from_product([[2017, 2018, 2019], ['Jan-Jun', 'Jul-Dec']],
names=['year', 'period'])
columns = pd.MultiIndex.from_product([['Company A', 'Company B', 'Company C'], ['sales', 'purchases']],
names=['company', 'total value'])
data = np.random.randint(20000, 100000, (6, 6))
a = pd.DataFrame(data, index=rows, columns=columns)
a
Out[1]:
Here are some examples of indexing and slicing:
In [2]:
# Let's access all data relevant to Company C.
a['Company C']
Out[2]:
In [3]:
# And now we need just the sales data of Company A.
a['Company A', 'sales']
Out[3]:
In [4]:
# We can also use the loc and iloc indexers, the former for explicit label indices and the latter for
# implicit positional integer indices.
a.iloc[:2, :2]
Out[4]:
In [5]:
# You can also pass a tuple of multiple indices.
a.loc[:, ('Company B', 'purchases')]
Out[5]:
As far as slicing is concerned, the optimal way of doing this is by means of the Pandas IndexSlice object. Have a look:
In [8]:
idx = pd.IndexSlice
a.loc[idx[:, 'Jan-Jun'], idx[:, 'sales']]
Out[8]: