Spread the love

In the previous part of this series we created a DataFrame from a dictionary of Series objects. Let’s quickly recreate the DataFrame here:

In [1]:

import numpy as np
import pandas as pd

atomic_numbers = pd.Series({'helium': 2, 'oxygen': 8, 'sodium': 11, 'carbon': 6, 'sulfur': 16})
atomic_numbers

atomic_masses = pd.Series({'helium': 4.003, 'oxygen': 15.999, 'sodium': 22.990, 'carbon': 12.011, 'sulfur': 32.066})
atomic_masses

elements = pd.DataFrame({'atomic number': atomic_numbers,
                        'atomic mass': atomic_masses})
elements

Out[1]:

	atomic number	atomic mass
helium	2	4.003
oxygen	8	15.999
sodium	11	22.990
carbon	6	12.011
sulfur	16	32.066

This is how we can create DataFrame objects, but it’s not the only way to do it. Let’s have a look at some other options.

DataFrame from a Single Series Object

Let’s start with something simple. If our DataFrame needs only one column, we can create it from just one single Series object:

In [4]:

pd.DataFrame(atomic_masses, columns = ['atomic mass'])

Out[4]:

	atomic mass
helium	4.003
oxygen	15.999
sodium	22.990
carbon	12.011
sulfur	32.066

DataFrame from a List of Dictionaries

You can also create a DataFrame from a list of dictionaries. Then the keys will become the column names.

In [9]:

# dictionary 1:
age_dict = {'Monica': 44, 'Jake': 39, 'Sarah': 37}

# dictionary 2:
salary_dict = {'Monica': 41200, 'Jake': 32400, 'Sarah': 61400}

# list of dictionaries
staff_lst = [age_dict, salary_dict]

# DataFrame
staff = pd.DataFrame(staff_lst)

staff

Out[9]:

	Monica	Jake	Sarah
0	44	39	37
1	41200	32400	61400

If the default integer 0-based indices don’t tell you much, you can explicitly pass your own ones:

In [10]:

staff = pd.DataFrame(staff_lst, index = ['age', 'salary'])
staff

Out[10]:

	Monica	Jake	Sarah
age	44	39	37
salary	41200	32400	61400

Now it looks better. But what if some of the keys are only in one dictionary, but the other? In such a case the missing data will be filled in with NaN values. NaN stands for ‘Not a Number’. We’re going to discuss NaN values in one of my future articles in this series.

Anyway, to demonstrate how it works, let’s make some changes to our dictionaries and even let’s add one more:

In [12]:

# dictionary 1:
age_dict = {'Monica': 44, 'Jake': 39, 'Sarah': 37, 'Ben': 41}

# dictionary 2:
salary_dict = {'Bryce': 49, 'Monica': 41200, 'Jake': 32400, 'Sarah': 61400}

# dictionary 3:
sales_dict = {'Bryce': 331458, 'Jake': 741877, 'Ben': 425654, 'Sarah': 99874, 'Joe': 174541}

# list of dictionaries
staff_lst = [age_dict, salary_dict, sales_dict]

# DataFrame
staff = pd.DataFrame(staff_lst, index = ['age', 'salary', 'sales'])

staff

Out[12]:

	Monica	Jake	Sarah	Ben	Bryce	Joe
age	44.0	39	37	41.0	NaN	NaN
salary	41200.0	32400	61400	NaN	49.0	NaN
sales	NaN	741877	99874	425654.0	331458.0	174541.0

As a side effect the numbers in columns with NaN values were converted to floats.

DataFrame from a Two-Dimensional numpy Array

In our last example in this article we’ll create a DataFrame from a two-dimensional numpy array:

In [13]:

# let's define a numpy array
random_arr = np.random.rand(3, 4)

random_arr

Out[13]:

array([[0.78923546, 0.22045163, 0.20547258, 0.27474091],
       [0.98599478, 0.79942169, 0.83339034, 0.42043241],
       [0.63522506, 0.74496412, 0.11338136, 0.69158224]])

In [14]:

# and now let's use it to create a DataFrame
pd.DataFrame(random_arr)

Out[14]:

	0	1	2	3
0	0.789235	0.220452	0.205473	0.274741
1	0.985995	0.799422	0.833390	0.420432
2	0.635225	0.744964	0.113381	0.691582

In [15]:

# naturally, we could have specified the column and index names as well
pd.DataFrame(random_arr, 
            columns = ['A', 'B', 'C', 'D'],
            index = ['version 1', 'version 2', 'version 3'])

Out[15]:

	A	B	C	D
version 1	0.789235	0.220452	0.205473	0.274741
version 2	0.985995	0.799422	0.833390	0.420432
version 3	0.635225	0.744964	0.113381	0.691582

And there are other ways of creating DataFrame objects as well, like for example from regular Python lists, but the ones discussed above are the ones we’ll be making use of most.