Skip to content
Home » pandas Part 9 – Indexing and Slicing DataFrame Objects

pandas Part 9 – Indexing and Slicing DataFrame Objects

Spread the love

We know how to index and slice Series objects. What about DataFrames? Let’s have a look at the example we used before:

In [8]:
import numpy as np
import pandas as pd

atomic_numbers = pd.Series({'helium': 2, 'oxygen': 8, 'sodium': 11, 'carbon': 6, 'sulfur': 16})
atomic_numbers

atomic_masses = pd.Series({'helium': 4.003, 'oxygen': 15.999, 'sodium': 22.990, 'carbon': 12.011, 'sulfur': 32.066})
atomic_masses

elements = pd.DataFrame({'atomic number': atomic_numbers,
                        'atomic mass': atomic_masses})
elements
Out[8]:
atomic number atomic mass
helium 2 4.003
oxygen 8 15.999
sodium 11 22.990
carbon 6 12.011
sulfur 16 32.066

If you want to select just one column, which is a Series object by the way, you can just use the name of that column:

In [9]:
# just atomic numbers
elements['atomic number']
Out[9]:
helium     2
oxygen     8
sodium    11
carbon     6
sulfur    16
Name: atomic number, dtype: int64

You can use this syntax to add new columns, just like with dictionaries. Let’s add a symbol column:

In [10]:
elements['symbol'] = pd.Series({'helium': 'He', 'oxygen': 'O', 'sodium': 'Na', 'carbon': 'C', 'sulfur': 'S'})
elements
Out[10]:
atomic number atomic mass symbol
helium 2 4.003 He
oxygen 8 15.999 O
sodium 11 22.990 Na
carbon 6 12.011 C
sulfur 16 32.066 S

If the column name is a string that doesn’t conflict with any method name in the DataFrame class, you can use the solumn name like an attribute:

In [11]:
elements.symbol
Out[11]:
helium    He
oxygen     O
sodium    Na
carbon     C
sulfur     S
Name: symbol, dtype: object

However keep in mind that it doesn’t work if the column name isn’t a string or if it conflicts with a method name in the DataFrame class, so I will just keep to the square bracket notation.

You can also view the DataFrame as a two-dimensional numpy array. You will lose the indices and column names, but you will see just the raw values. If this is what you need, just use the values attribute:

In [12]:
elements.values
Out[12]:
array([[2, 4.003, 'He'],
       [8, 15.999, 'O'],
       [11, 22.99, 'Na'],
       [6, 12.011, 'C'],
       [16, 32.066, 'S']], dtype=object)

As this is a numpy array, we can index it like a numpy array, which means you can access a single row if you pass a single positional index:

In [14]:
# first row
elements.values[0]
Out[14]:
array([2, 4.003, 'He'], dtype=object)

This is what you would rather expect from a numpy array, but here we’re dealing with a DataFrame object, so let’s stick to it.

You can slice DataFrames in a similar way as numpy arrays, but just like with the Series object, you can use label indices and positional indices. Also just like with the Series object, you can use the loc and iloc indexers for label and positional indexing respectively:

In [15]:
# label indexing
elements.loc['oxygen':'carbon']
Out[15]:
atomic number atomic mass symbol
oxygen 8 15.999 O
sodium 11 22.990 Na
carbon 6 12.011 C
In [16]:
# positional indexing
elements.iloc[1:3]
Out[16]:
atomic number atomic mass symbol
oxygen 8 15.999 O
sodium 11 22.990 Na

As DataFrames are two-dimensional structures, you can naturally use a separate slice for each dimension:

In [19]:
# label indexing
elements.loc['oxygen':'carbon', 'atomic mass':]
Out[19]:
atomic mass symbol
oxygen 15.999 O
sodium 22.990 Na
carbon 12.011 C
In [20]:
# positional indexing
elements.iloc[3:1:-1, :2]
Out[20]:
atomic number atomic mass
carbon 6 12.011
sodium 11 22.990

As you can see, I also used a step in the first slice above.

As far as slicing is concerned, you can use all the techniques that you can use with Series objects, like masking or fancy indexing:

In [24]:
# masking for rows, fancy indexing for columns
elements.loc[elements['atomic number'] < 8, ['symbol', 'atomic number']]
Out[24]:
symbol atomic number
helium He 2
carbon C 6

You can also modify data using any of the above ways of indexing. We shouldn’t modify the data that we have in our example because these are all constant values, but just for demonstrational purposes we’ll change a value to its rounded counterpart and then change it back using a different indexing approach.

In [29]:
# let's change the atomic mass of sulfur, which is 32.066 to 32 using positional indices
elements.iloc[4, 1] = 32
elements
Out[29]:
atomic number atomic mass symbol
helium 2.0 4.003 He
oxygen 8.0 15.999 O
sodium 11.0 22.990 Na
carbon 6.0 12.011 C
sulfur 16.0 32.000 S

As you can see the change has taken effect, even though we still have a float number. Now let’s change the value back using label indexing:

In [30]:
elements.loc['sulfur', 'atomic mass'] = 32.066
elements
Out[30]:
atomic number atomic mass symbol
helium 2.0 4.003 He
oxygen 8.0 15.999 O
sodium 11.0 22.990 Na
carbon 6.0 12.011 C
sulfur 16.0 32.066 S

Your Panda3D Magazine

Make Awesome Games and Other 3D Apps

with Panda3D and Blender using Python.

Cool stuff, easy to follow articles.

Get the magazine here (PDF).

Python Jumpstart Course

Learn the basics of Python, including OOP.

with lots of exercises, easy to follow

The course is available on Udemy.

Blender Jumpstart Course

Learn the basics of 3D modeling in Blender.

step-by-step, easy to follow, visually rich

The course is available on Udemy and on Skillshare.

Here’s the video version of the article:


Spread the love
Tags:

Leave a Reply