Skip to content
Home » pandas Part 5 – Creating DataFrame Objects

pandas Part 5 – Creating DataFrame Objects

Spread the love

In the previous part of this series we created a DataFrame from a dictionary of Series objects. Let’s quickly recreate the DataFrame here:

In [1]:
import numpy as np
import pandas as pd

atomic_numbers = pd.Series({'helium': 2, 'oxygen': 8, 'sodium': 11, 'carbon': 6, 'sulfur': 16})
atomic_numbers

atomic_masses = pd.Series({'helium': 4.003, 'oxygen': 15.999, 'sodium': 22.990, 'carbon': 12.011, 'sulfur': 32.066})
atomic_masses

elements = pd.DataFrame({'atomic number': atomic_numbers,
                        'atomic mass': atomic_masses})
elements
Out[1]:
atomic number atomic mass
helium 2 4.003
oxygen 8 15.999
sodium 11 22.990
carbon 6 12.011
sulfur 16 32.066

This is how we can create DataFrame objects, but it’s not the only way to do it. Let’s have a look at some other options.

DataFrame from a Single Series Object

Let’s start with something simple. If our DataFrame needs only one column, we can create it from just one single Series object:

In [4]:
pd.DataFrame(atomic_masses, columns = ['atomic mass'])
Out[4]:
atomic mass
helium 4.003
oxygen 15.999
sodium 22.990
carbon 12.011
sulfur 32.066

DataFrame from a List of Dictionaries

You can also create a DataFrame from a list of dictionaries. Then the keys will become the column names.

In [9]:
# dictionary 1:
age_dict = {'Monica': 44, 'Jake': 39, 'Sarah': 37}

# dictionary 2:
salary_dict = {'Monica': 41200, 'Jake': 32400, 'Sarah': 61400}

# list of dictionaries
staff_lst = [age_dict, salary_dict]

# DataFrame
staff = pd.DataFrame(staff_lst)

staff
Out[9]:
Monica Jake Sarah
0 44 39 37
1 41200 32400 61400

If the default integer 0-based indices don’t tell you much, you can explicitly pass your own ones:

In [10]:
staff = pd.DataFrame(staff_lst, index = ['age', 'salary'])
staff
Out[10]:
Monica Jake Sarah
age 44 39 37
salary 41200 32400 61400

Now it looks better. But what if some of the keys are only in one dictionary, but the other? In such a case the missing data will be filled in with NaN values. NaN stands for ‘Not a Number’. We’re going to discuss NaN values in one of my future articles in this series.

Anyway, to demonstrate how it works, let’s make some changes to our dictionaries and even let’s add one more:

In [12]:
# dictionary 1:
age_dict = {'Monica': 44, 'Jake': 39, 'Sarah': 37, 'Ben': 41}

# dictionary 2:
salary_dict = {'Bryce': 49, 'Monica': 41200, 'Jake': 32400, 'Sarah': 61400}

# dictionary 3:
sales_dict = {'Bryce': 331458, 'Jake': 741877, 'Ben': 425654, 'Sarah': 99874, 'Joe': 174541}

# list of dictionaries
staff_lst = [age_dict, salary_dict, sales_dict]

# DataFrame
staff = pd.DataFrame(staff_lst, index = ['age', 'salary', 'sales'])

staff
Out[12]:
Monica Jake Sarah Ben Bryce Joe
age 44.0 39 37 41.0 NaN NaN
salary 41200.0 32400 61400 NaN 49.0 NaN
sales NaN 741877 99874 425654.0 331458.0 174541.0

As a side effect the numbers in columns with NaN values were converted to floats.

DataFrame from a Two-Dimensional numpy Array

In our last example in this article we’ll create a DataFrame from a two-dimensional numpy array:

In [13]:
# let's define a numpy array
random_arr = np.random.rand(3, 4)

random_arr
Out[13]:
array([[0.78923546, 0.22045163, 0.20547258, 0.27474091],
       [0.98599478, 0.79942169, 0.83339034, 0.42043241],
       [0.63522506, 0.74496412, 0.11338136, 0.69158224]])
In [14]:
# and now let's use it to create a DataFrame
pd.DataFrame(random_arr)
Out[14]:
0 1 2 3
0 0.789235 0.220452 0.205473 0.274741
1 0.985995 0.799422 0.833390 0.420432
2 0.635225 0.744964 0.113381 0.691582
In [15]:
# naturally, we could have specified the column and index names as well
pd.DataFrame(random_arr, 
            columns = ['A', 'B', 'C', 'D'],
            index = ['version 1', 'version 2', 'version 3'])
Out[15]:
A B C D
version 1 0.789235 0.220452 0.205473 0.274741
version 2 0.985995 0.799422 0.833390 0.420432
version 3 0.635225 0.744964 0.113381 0.691582

And there are other ways of creating DataFrame objects as well, like for example from regular Python lists, but the ones discussed above are the ones we’ll be making use of most.

Your Panda3D Magazine

Make Awesome Games and Other 3D Apps

with Panda3D and Blender using Python.

Cool stuff, easy to follow articles.

Get the magazine here (PDF).

Python Jumpstart Course

Learn the basics of Python, including OOP.

with lots of exercises, easy to follow

The course is available on Udemy.

Blender Jumpstart Course

Learn the basics of 3D modeling in Blender.

step-by-step, easy to follow, visually rich

The course is available on Udemy and on Skillshare.

Here’s the video version of the article:


Spread the love

Leave a Reply