We know how to index and slice Series objects. What about DataFrames? Let’s have a look at the example we used before:
import numpy as np
import pandas as pd
atomic_numbers = pd.Series({'helium': 2, 'oxygen': 8, 'sodium': 11, 'carbon': 6, 'sulfur': 16})
atomic_numbers
atomic_masses = pd.Series({'helium': 4.003, 'oxygen': 15.999, 'sodium': 22.990, 'carbon': 12.011, 'sulfur': 32.066})
atomic_masses
elements = pd.DataFrame({'atomic number': atomic_numbers,
'atomic mass': atomic_masses})
elements
If you want to select just one column, which is a Series object by the way, you can just use the name of that column:
# just atomic numbers
elements['atomic number']
You can use this syntax to add new columns, just like with dictionaries. Let’s add a symbol column:
elements['symbol'] = pd.Series({'helium': 'He', 'oxygen': 'O', 'sodium': 'Na', 'carbon': 'C', 'sulfur': 'S'})
elements
If the column name is a string that doesn’t conflict with any method name in the DataFrame class, you can use the solumn name like an attribute:
elements.symbol
However keep in mind that it doesn’t work if the column name isn’t a string or if it conflicts with a method name in the DataFrame class, so I will just keep to the square bracket notation.
You can also view the DataFrame as a two-dimensional numpy array. You will lose the indices and column names, but you will see just the raw values. If this is what you need, just use the values attribute:
elements.values
As this is a numpy array, we can index it like a numpy array, which means you can access a single row if you pass a single positional index:
# first row
elements.values[0]
This is what you would rather expect from a numpy array, but here we’re dealing with a DataFrame object, so let’s stick to it.
You can slice DataFrames in a similar way as numpy arrays, but just like with the Series object, you can use label indices and positional indices. Also just like with the Series object, you can use the loc and iloc indexers for label and positional indexing respectively:
# label indexing
elements.loc['oxygen':'carbon']
# positional indexing
elements.iloc[1:3]
As DataFrames are two-dimensional structures, you can naturally use a separate slice for each dimension:
# label indexing
elements.loc['oxygen':'carbon', 'atomic mass':]
# positional indexing
elements.iloc[3:1:-1, :2]
As you can see, I also used a step in the first slice above.
As far as slicing is concerned, you can use all the techniques that you can use with Series objects, like masking or fancy indexing:
# masking for rows, fancy indexing for columns
elements.loc[elements['atomic number'] < 8, ['symbol', 'atomic number']]
You can also modify data using any of the above ways of indexing. We shouldn’t modify the data that we have in our example because these are all constant values, but just for demonstrational purposes we’ll change a value to its rounded counterpart and then change it back using a different indexing approach.
# let's change the atomic mass of sulfur, which is 32.066 to 32 using positional indices
elements.iloc[4, 1] = 32
elements
As you can see the change has taken effect, even though we still have a float number. Now let’s change the value back using label indexing:
elements.loc['sulfur', 'atomic mass'] = 32.066
elements
Here’s the video version of the article: