Knowing basics around Python is a need for development in Data Science. In our previous guides we talked about installing Jupyter and working with Jupyter. This article is not exactly related to Jupyter Notebook but is very important for the developers around big data and data science. Here is List of Python Libraries For Data Science & Machine Learning.
List of Python Libraries For Data Science & Machine Learning
We are listing the libraries as numbered list. Search on GitHub will show you the official repositories and details.
- NumPy : Practically base library for scientific computing in Python
- SciPy : Enhances NumPy by adding collection of algorithms and commands for manipulating and visualizing data.
- Pandas : Designed for practical data analysis in finance, statistics, social sciences, and engineering.
- matplotlib : Standard, low level Python library for creating 2D plots and graphs from data.
- Theano : Python library for numerical computation, and is like Numpy.
- NLTK : Platform for building Python programs to work with human language data.
- Scrapy : Useful to create bots to crawl the web and extract structured data like prices and URLs.
- Pattern : Combines functionalities of Scrapy and NLTK
- Seaborn : Focused on the visualization of statistical models.
- Basemap : Adds support for simple maps to matplotlib.
- Bokeh : Visualization library aimed at interactive visualizations, independent of Matplotlib.
- Statsmodels : Python module to explore data, estimate statistical models, and perform statistical tests.
- Plotly : A web-based toolbox for building visualizations.
- Scikit-learn : Additional packages for SciPy for specific works like image processing, machine learning.
- Theano : Please read the official repository.
- d3py : plotting library to create interactive data visualizations based on d3.
- ggplot : Port of R’s popular ggplot2 library, alternative syntax to add new visualization style to Python.
- prettyplotlib : Enhancement library to make matplotlib’s default styles into beautiful, presentation-ready plots.
- csvkit :Offers additional functionality and features over Python’s in-built module.
- PyTables : Combines HDF5 and NumPy for working with very large datasets.
- TensorFlow : For deep learning.