In the previous two parts of this series we were talking about the Series class in general and about creating Series objects in pandas. In this part we’ll be talking about another fundamental pandas data type, the DataFrame.
You can imagine a DataFrame as a sequence of Series objects, all sharing the same index. But even better than to imagine things is to see them in action, so let’s create a DataFrame.
First we’ll create two Series objects with the same indices, then we’ll make a DataFrame from them. Here are our Series objects:
import numpy as np
import pandas as pd
atomic_numbers = pd.Series({'helium': 2, 'oxygen': 8, 'sodium': 11, 'carbon': 6, 'sulfur': 16})
atomic_numbers
atomic_masses = pd.Series({'helium': 4.003, 'oxygen': 15.999, 'sodium': 22.990, 'carbon': 12.011, 'sulfur': 32.066})
atomic_masses
These two data structures could now be combined into one, the DataFrame. To do that we’ll pass another dictionary to the constructor of the DataFrame class. The keys will be the names of the two columns that will be created and the corresponding values will be the two dictionaries we just created:
elements = pd.DataFrame({'atomic number': atomic_numbers,
'atomic mass': atomic_masses})
elements
What we just got is a clear two-dimensional data structure that contains all the information combined. Just like we had the values and index attributes with the Series class, here we have the index and columns attributes that will give us Index objects:
elements.index
elements.columns
You can use the column name to obtain a Series of all the elements in that column:
elements['atomic mass']
type(elements['atomic mass'])
In the next part we’ll see how to create DataFrame objects. There are quite a few ways of doing this.
Here’s the video version of this article: