In the previous part of this series we created a DataFrame from a dictionary of Series objects. Let’s quickly recreate the DataFrame here:
import numpy as np
import pandas as pd
atomic_numbers = pd.Series({'helium': 2, 'oxygen': 8, 'sodium': 11, 'carbon': 6, 'sulfur': 16})
atomic_numbers
atomic_masses = pd.Series({'helium': 4.003, 'oxygen': 15.999, 'sodium': 22.990, 'carbon': 12.011, 'sulfur': 32.066})
atomic_masses
elements = pd.DataFrame({'atomic number': atomic_numbers,
'atomic mass': atomic_masses})
elements
This is how we can create DataFrame objects, but it’s not the only way to do it. Let’s have a look at some other options.
DataFrame from a Single Series Object
Let’s start with something simple. If our DataFrame needs only one column, we can create it from just one single Series object:
pd.DataFrame(atomic_masses, columns = ['atomic mass'])
DataFrame from a List of Dictionaries
You can also create a DataFrame from a list of dictionaries. Then the keys will become the column names.
# dictionary 1:
age_dict = {'Monica': 44, 'Jake': 39, 'Sarah': 37}
# dictionary 2:
salary_dict = {'Monica': 41200, 'Jake': 32400, 'Sarah': 61400}
# list of dictionaries
staff_lst = [age_dict, salary_dict]
# DataFrame
staff = pd.DataFrame(staff_lst)
staff
If the default integer 0-based indices don’t tell you much, you can explicitly pass your own ones:
staff = pd.DataFrame(staff_lst, index = ['age', 'salary'])
staff
Now it looks better. But what if some of the keys are only in one dictionary, but the other? In such a case the missing data will be filled in with NaN values. NaN stands for ‘Not a Number’. We’re going to discuss NaN values in one of my future articles in this series.
Anyway, to demonstrate how it works, let’s make some changes to our dictionaries and even let’s add one more:
# dictionary 1:
age_dict = {'Monica': 44, 'Jake': 39, 'Sarah': 37, 'Ben': 41}
# dictionary 2:
salary_dict = {'Bryce': 49, 'Monica': 41200, 'Jake': 32400, 'Sarah': 61400}
# dictionary 3:
sales_dict = {'Bryce': 331458, 'Jake': 741877, 'Ben': 425654, 'Sarah': 99874, 'Joe': 174541}
# list of dictionaries
staff_lst = [age_dict, salary_dict, sales_dict]
# DataFrame
staff = pd.DataFrame(staff_lst, index = ['age', 'salary', 'sales'])
staff
As a side effect the numbers in columns with NaN values were converted to floats.
DataFrame from a Two-Dimensional numpy Array
In our last example in this article we’ll create a DataFrame from a two-dimensional numpy array:
# let's define a numpy array
random_arr = np.random.rand(3, 4)
random_arr
# and now let's use it to create a DataFrame
pd.DataFrame(random_arr)
# naturally, we could have specified the column and index names as well
pd.DataFrame(random_arr,
columns = ['A', 'B', 'C', 'D'],
index = ['version 1', 'version 2', 'version 3'])
And there are other ways of creating DataFrame objects as well, like for example from regular Python lists, but the ones discussed above are the ones we’ll be making use of most.
Here’s the video version of the article: