Speaker "Alex Perrier" Details Back



Large data in Python with Scikit-learn and Dask


Although Scikit learn is optimized for small data, its out-of-core features enable the data scientist to work with Large data, i.e. Data that does not fit in the computer's memory. I'll present the scikit-learn algorithms compatible with this batch training approach and their respective performances on large datasets. However, data minging remains a time consuming problem when dealing with Large Data. This where, Dask a Python library comes in. By breaking operations into sequences that can be parallelized, Dask addresses the Large Data pre-processing part of the problem.


Data Scientist at Berklee online, Contributor @ODSC, PhD signal processing,