5 Incredibly Useful Python Libraries For Data Science

Mars
2 min readDec 10, 2019

--

Source: realpython.com

Data science consists of extremely time-consuming and complex tasks. Luckily, there are a lot of open sources in Python that provide convenience for developers. Here, I would like to suggest some lesser-known libraries — they are not essential, but good choices for data science project perfection.

Cookiecutter: Better Project Template

Cookiecutter provides a better data science template for a user who wants to build a project. It can quickly organise a messy source code, files and data for the user.

pip install --user Cookiecutter

More info: Cookiecutter, Github

Pandas-profilling: Speed Up Exploratory Data Analysis(EDA)

It can speed up the process of Exploratory Data Analysis (EDA) with only single line code. This includes missing values detection, distribution, correlation, etc.

## Using pip
pip install pandas-profiling
## Using Conda
conda install -c conda-forge pandas-profiling

More info: pandas-profilling, Github

imbalanced-learn: Handle imbalanced data

A professional library that can handle data with imbalanced labels, which includes techniques of resampling like: Over-sampling (SMOTE), Under-sampling (TomekLinks) and combine-sampling (SMOTEENN). It also provides an ensemble balanced model for machine learning.

## Using pip
pip install imbalanced-learn
## Using Conda
conda install -c conda-forge imbalanced-learn

More info: imbalanced-learn, Github

featuretools: Automated features engineering

Open source and tools automated features engineering.

## Using pip
pip install featuretools
## Using Conda
conda install -c conda-forge featuretools

More info: featuretools, Github

Cufflinks

An excellent data visualisation tool for easy interactive Pandas charting with Plotly for the users.

## Using pip
pip install cufflinks
## Using Conda
conda install -c conda-forge cufflinks-py

More info: Github

Conclusions

Indeed, data science projects are time-consuming and complex, therefore these libraries are absolutely recommended for speeding up the process and reducing the bothersome procedures of developing a data science project. If you have any other open sources to share, you are welcome to list them out in your comment.

--

--

Mars
Mars

Written by Mars

Data Scientist, Quantitative research and trader.

No responses yet