Spread the love

We know how to index and slice Series objects. What about DataFrames? Let’s have a look at the example we used before:

In [8]:

import numpy as np
import pandas as pd

atomic_numbers = pd.Series({'helium': 2, 'oxygen': 8, 'sodium': 11, 'carbon': 6, 'sulfur': 16})
atomic_numbers

atomic_masses = pd.Series({'helium': 4.003, 'oxygen': 15.999, 'sodium': 22.990, 'carbon': 12.011, 'sulfur': 32.066})
atomic_masses

elements = pd.DataFrame({'atomic number': atomic_numbers,
                        'atomic mass': atomic_masses})
elements

Out[8]:

	atomic number	atomic mass
helium	2	4.003
oxygen	8	15.999
sodium	11	22.990
carbon	6	12.011
sulfur	16	32.066

If you want to select just one column, which is a Series object by the way, you can just use the name of that column:

In [9]:

# just atomic numbers
elements['atomic number']

Out[9]:

helium     2
oxygen     8
sodium    11
carbon     6
sulfur    16
Name: atomic number, dtype: int64

You can use this syntax to add new columns, just like with dictionaries. Let’s add a symbol column:

In [10]:

elements['symbol'] = pd.Series({'helium': 'He', 'oxygen': 'O', 'sodium': 'Na', 'carbon': 'C', 'sulfur': 'S'})
elements

Out[10]:

	atomic number	atomic mass	symbol
helium	2	4.003	He
oxygen	8	15.999	O
sodium	11	22.990	Na
carbon	6	12.011	C
sulfur	16	32.066	S

If the column name is a string that doesn’t conflict with any method name in the DataFrame class, you can use the solumn name like an attribute:

In [11]:

elements.symbol

Out[11]:

helium    He
oxygen     O
sodium    Na
carbon     C
sulfur     S
Name: symbol, dtype: object

However keep in mind that it doesn’t work if the column name isn’t a string or if it conflicts with a method name in the DataFrame class, so I will just keep to the square bracket notation.

You can also view the DataFrame as a two-dimensional numpy array. You will lose the indices and column names, but you will see just the raw values. If this is what you need, just use the values attribute:

In [12]:

elements.values

Out[12]:

array([[2, 4.003, 'He'],
       [8, 15.999, 'O'],
       [11, 22.99, 'Na'],
       [6, 12.011, 'C'],
       [16, 32.066, 'S']], dtype=object)

As this is a numpy array, we can index it like a numpy array, which means you can access a single row if you pass a single positional index:

In [14]:

# first row
elements.values[0]

Out[14]:

array([2, 4.003, 'He'], dtype=object)

This is what you would rather expect from a numpy array, but here we’re dealing with a DataFrame object, so let’s stick to it.

You can slice DataFrames in a similar way as numpy arrays, but just like with the Series object, you can use label indices and positional indices. Also just like with the Series object, you can use the loc and iloc indexers for label and positional indexing respectively:

In [15]:

# label indexing
elements.loc['oxygen':'carbon']

Out[15]:

	atomic number	atomic mass	symbol
oxygen	8	15.999	O
sodium	11	22.990	Na
carbon	6	12.011	C

In [16]:

# positional indexing
elements.iloc[1:3]

Out[16]:

	atomic number	atomic mass	symbol
oxygen	8	15.999	O
sodium	11	22.990	Na

As DataFrames are two-dimensional structures, you can naturally use a separate slice for each dimension:

In [19]:

# label indexing
elements.loc['oxygen':'carbon', 'atomic mass':]

Out[19]:

	atomic mass	symbol
oxygen	15.999	O
sodium	22.990	Na
carbon	12.011	C

In [20]:

# positional indexing
elements.iloc[3:1:-1, :2]

Out[20]:

	atomic number	atomic mass
carbon	6	12.011
sodium	11	22.990

As you can see, I also used a step in the first slice above.

As far as slicing is concerned, you can use all the techniques that you can use with Series objects, like masking or fancy indexing:

In [24]:

# masking for rows, fancy indexing for columns
elements.loc[elements['atomic number'] < 8, ['symbol', 'atomic number']]

Out[24]:

	symbol	atomic number
helium	He	2
carbon	C	6

You can also modify data using any of the above ways of indexing. We shouldn’t modify the data that we have in our example because these are all constant values, but just for demonstrational purposes we’ll change a value to its rounded counterpart and then change it back using a different indexing approach.

In [29]:

# let's change the atomic mass of sulfur, which is 32.066 to 32 using positional indices
elements.iloc[4, 1] = 32
elements

Out[29]:

	atomic number	atomic mass	symbol
helium	2.0	4.003	He
oxygen	8.0	15.999	O
sodium	11.0	22.990	Na
carbon	6.0	12.011	C
sulfur	16.0	32.000	S

As you can see the change has taken effect, even though we still have a float number. Now let’s change the value back using label indexing:

In [30]:

elements.loc['sulfur', 'atomic mass'] = 32.066
elements

Out[30]:

	atomic number	atomic mass	symbol
helium	2.0	4.003	He
oxygen	8.0	15.999	O
sodium	11.0	22.990	Na
carbon	6.0	12.011	C
sulfur	16.0	32.066	S

Learn how to make beautiful GUI apps

in Python using the Kivy framework.

Comprehensive, for Kivy beginners, easy to follow.

Get the book here (PDF) or on Amazon:

ebook / paperback (black and white) / paperback (full color)

Your Panda3D Magazine

Make Awesome Games and Other 3D Apps

with Panda3D and Blender using Python.

Cool stuff, easy to follow articles.

Get the magazine here (PDF).

Python Jumpstart Course

Learn the basics of Python, including OOP.

with lots of exercises, easy to follow

The course is available on Udemy.

Blender Jumpstart Course

Learn the basics of 3D modeling in Blender.

step-by-step, easy to follow, visually rich

The course is available on Udemy and on Skillshare.

Here’s the video version of the article:

Spread the love

pandas Part 9 – Indexing and Slicing DataFrame Objects

Like this:

Leave a ReplyCancel reply

pandas Part 9 – Indexing and Slicing DataFrame Objects

Share this:

Like this:

Leave a ReplyCancel reply