Back

 Industry News Details

 
The 20 Python Packages You Need For Machine Learning and Data Science Posted on : Jan 29 - 2022

We are going to look at the 20 Python Packages you should know for all your Data Science, Data Engineering, and Machine Learning projects. These are the packages that I found most useful during my career as a Machine Learning Engineer and Python Programmer. While such a list can never be complete, it surely gives you a few tools for every use case.

1. Open CV

The open-source computer vision library, Open-CV, is your best friend when it comes to images and videos. It offers great efficient solutions to common image problems such as face detection and object detection. Or, as we can see below, Edge detection, the process of detecting various lines inside an image. If you are planning to work with images in data science, this library is a must. Open CV gathered a massive 56,000 stars on Github and made working with image data several times faster and easier for me.

2. Matplotlib

Data visualization is your main way to communicate with non-Data Wizards. If you think about it, even apps are merely a way to visualize various data interactions behind the scene. Matplolib is the basis of image visualization in Python. From visualizing your edge detection algorithm to looking at distributions in your data, Matplolib is your partner in crime. 14,000 stars on GitHub and surely a great library to start contributing to. I made, for example, this animated lineplot in a recent video using seaborn and matplotlib.

3. pip

Given that we are talking about Python packages, we have to take a moment to talk about their master pip. Without it, you can’t install any of the others. Its only purpose is to install packages from the Python Package Index or places such as GitHub. But you can also use it to install your own custom-build packages. 7400 stars just don’t reflect how important it is for the Python Community.

4. Numpy

Python wouldn't be the most popular programming language without Numpy. It is the foundation of all data science and machine learning packages, an essential package for all math-intensive computations with Python. All that nasty linear algebra and fancy math you learned in university are basically handled by Numpy in a very efficient way. Its syntax style can be seen in many of the important data libraries. 18,100 stars on GitHub give you a glimpse into how crucial of a basis for the python ecosystem it is. View More