Speaker "Theodore Petrou" Details Back



1. Hands on Workshop : Minimally Sufficient Pandas
2. Build your own Pandas Cub


1. Hands on Workshop : Minimally Sufficient Pandas
The Python Pandas library is powerful yet confusing as there are always multiple ways of completing the same task. Most Pandas users want to do actual data analysis and not remember esoteric syntax. A minimal subset of the library is sufficient to handle the vast majority of your data analysis workload. This tutorial provides guidelines and hands-on exercises on how to maximize your productivity from Pandas without being drowned in syntax.

2. Build your own Pandas Cub

A typical data scientist’s workflow in Python consists of firing up a Jupyter Notebook, importing NumPy, Pandas, Matplotlib, and Scikit-Learn into the workspace and then completing a data analysis. The APIs from these libraries are well-known, mostly stable, and provide a powerful and flexible way of analyzing data. These libraries have contributed an enormous amount to the success of Python as a language of choice for doing data science as well as increasing productivity for the data scientists that use them.

For those data scientists that are interested in learning how to develop their own data science tools, relying on these popular, easy-to-use libraries hides the complexities and underlying Python code. In fact, it is so easy to produce data science results in Python, that one only needs to know the very basics of the language along with knowledge of the library’s API.

In this hands-on tutorial, we will build our own data analysis package from scratch. Specifically, our package will contain a DataFrame Class with a Pandas-like API. We will make heavy use of the Python data model, which contains special methods to help our DataFrame work with Python operators. By the end of the tutorial, we will have built a Python package that you can import into your workspace capable of performing the most important operations available in Pandas.


Ted Petrou is the author of Pandas Cookbook and founder of both Dunder Data and the Houston Data Science Meetup group. Ted received his Master's degree in statistics from Rice University and used his analytical skills to play poker professionally and teach math before becoming a data scientist. Ted is the author of the Dexplo and Dexplot Python data analysis and visualization libraries.