DATASET CONCATENATION
We can concatenate Series and DataFrame objects using the pd.concat method. Here’s a simple example of concatenation. Let’s start with two Series objects:
In [3]:
import numpy as np
import pandas as pd
s1 = pd.Series(['A', 'B', 'C'])
s2 = pd.Series(['D', 'E', 'F'])
pd.concat([s1, s2])
Out[3]:
As you can see, the indices overlap. You may want to specify them explicitly to avoid this:
In [4]:
s1 = pd.Series(['A', 'B', 'C'], index=[1, 2, 3])
s2 = pd.Series(['D', 'E', 'F'], index=[4, 5, 6])
pd.concat([s1, s2])
Out[4]:
And now an example with two dataframes. First let’s create a function to quickly create DataFrames. Let’s use a comprehension for that:
In [13]:
def create_df(cols, index):
x = {c: [str(c) + str(i) for i in index] for c in cols}
return pd.DataFrame(x, index)
In [14]:
# Let's create the first DataFrame using this function.
df1 = create_df('ABC', [1, 2, 3])
df1
Out[14]:
In [15]:
# Let's create the second DataFrame.
df2 = create_df('ABC', [4, 5, 6])
df2
Out[15]:
In [16]:
# And now let's concatenate the two.
pd.concat([df1, df2])
Out[16]:
As you can see, by default DataFrames are concatenated row-wise. If you want to change this behavior, you can specify the axis for the concatenation to take place along:
In [18]:
# Let's concatenate column-wise.
df1 = create_df('ABC', [1, 2, 3])
df2 = create_df('DEF', [1, 2, 3])
pd.concat([df1, df2], axis=1)
Out[18]: