pandas is a Python module that builds on top of numpy and is used to deal with data in a comfortable way. Many of its features and operations should quickly become familiar to you if you use some database programs or spreadsheets. With pandas you can manipulate structured data, which can be easily labeled.
The first thing you need to do is to install pandas. If you have the Anaconda distribution of Python, you already have pandas. Otherwise, you have to install it.
When you install pandas, numpy must also be installed on your computer. You can find detailed instructions on this website: https://pandas.pydata.org/getting_started.html. I’m not going to repeat them here, because you can always find the up-to-date information on the official pandas website.
With pandas installed, you can check the version of the package. Make sure to import it first. Just like with numpy, we usually use an alias for pandas:
import pandas as pd
pd.__version__
Before we dive into the pandas module, just a quick reminder. If you are using Jupyter Notebook to type your code, you can leverage the built-in documentation.
You can use the tab-completion feature to scan the whole namespace. To do that just type:
pd.
and then hit TAB. This will display everything that is available in the namespace.
You can also get detailed help on any class or method included in the module by typing the question mark symbol and running the code. For example if you want to read the documentation on the pandas Series class, all you have to do is type:
pd.Series?
and run the code. By the way, to run the code, simply hit Ctrl+Enter.
In the next part of this pandas series we’ll have a look at the Series class I just mentioned. It’s one of the three fundamental pandas data structures. The other two are DataFrame and Index. They will be discussed in the following parts.
Here’s the video version of this article: