# pandas-profiling
**Repository Path**: email4reg/pandas-profiling
## Basic Information
- **Project Name**: pandas-profiling
- **Description**: Create HTML profiling reports from pandas DataFrame objects
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-02-09
- **Last Updated**: 2020-12-18
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Pandas Profiling

[](https://travis-ci.com/pandas-profiling/pandas-profiling)
[](https://codecov.io/gh/pandas-profiling/pandas-profiling)
[](https://github.com/pandas-profiling/pandas-profiling/releases)
[](https://pypi.org/project/pandas-profiling/)
[](https://github.com/python/black)
Generates profile reports from a pandas `DataFrame`.
The pandas `df.describe()` function is great but a little basic for serious exploratory data analysis.
`pandas_profiling` extends the pandas DataFrame with `df.profile_report()` for quick data analysis.
For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:
* **Type inference**: detect the [types](#types) of columns in a dataframe.
* **Essentials**: type, unique values, missing values
* **Quantile statistics** like minimum value, Q1, median, Q3, maximum, range, interquartile range
* **Descriptive statistics** like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
* **Most frequent values**
* **Histogram**
* **Correlations** highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
* **Missing values** matrix, count, heatmap and dendrogram of missing values
* **Text analysis** learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
## Announcements
With your help, we got approved for [GitHub Sponsors](https://github.com/sponsors/sbrugman)!
It's extra exciting that GitHub **matches your contribution** for the first year.
Therefore, we welcome you to support the project through GitHub!
The v2.4 release includes many new features (performance, exporting, GUI and datasets) and stability improvements.
- [Sponsor the project on GitHub](https://github.com/sponsors/sbrugman)
- [Read the release notes v2.4](https://github.com/pandas-profiling/pandas-profiling/releases/tag/v2.4.0)
*January 7, 2020*
---
_Contents:_ **[Examples](#examples)** |
**[Installation](#installation)** | **[Documentation](#documentation)** |
**[Large datasets](#large-datasets)** | **[Command line usage](#command-line-usage)** |
**[Advanced usage](#advanced-usage)** |
**[Types](#types)** | **[How to contribute](#how-to-contribute)** |
**[Editor Integration](#editor-integration)** | **[Dependencies](#dependencies)**
---
## Examples
The following examples can give you an impression of what the package can do:
* [Census Income](http://pandas-profiling.github.io/pandas-profiling/examples/census/census_report.html) (US Adult Census data relating income)
* [NASA Meteorites](http://pandas-profiling.github.io/pandas-profiling/examples/meteorites/meteorites_report.html) (comprehensive set of meteorite landings)
* [Titanic](http://pandas-profiling.github.io/pandas-profiling/examples/titanic/titanic_report.html) (the "Wonderwall" of datasets)
* [NZA](http://pandas-profiling.github.io/pandas-profiling/examples/nza/nza_report.html) (open data from the Dutch Healthcare Authority)
* [Stata Auto](http://pandas-profiling.github.io/pandas-profiling/examples/stata_auto/stata_auto_report.html) (1978 Automobile data)
* [Vektis](http://pandas-profiling.github.io/pandas-profiling/examples/vektis/vektis_report.html) (Vektis Dutch Healthcare data)
* [Website Inaccessibility](http://pandas-profiling.github.io/pandas-profiling/examples/website_inaccessibility/website_inaccessibility_report.html) (demonstrates the URL type)
* [Colors](http://pandas-profiling.github.io/pandas-profiling/examples/colors/colors_report.html) (a simple colors dataset)
* [Russian Vocabulary](http://pandas-profiling.github.io/pandas-profiling/examples/russian_vocabulary/russian_vocabulary.html) (demonstrates text analysis)
## Installation
### Using pip
[](https://pepy.tech/project/pandas-profiling)
[](https://pepy.tech/project/pandas-profiling/month)
[](https://pypi.org/project/pandas-profiling/)
You can install using the pip package manager by running
pip install pandas-profiling[notebook,html]
Alternatively, you could install directly from Github:
pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip
### Using conda
[](https://anaconda.org/conda-forge/pandas-profiling)
[](https://anaconda.org/conda-forge/pandas-profiling)
You can install using the conda package manager by running
conda install -c conda-forge pandas-profiling
### From source
Download the source code by cloning the repository or by pressing ['Download ZIP'](https://github.com/pandas-profiling/pandas-profiling/archive/master.zip) on this page.
Install by navigating to the proper directory and running
python setup.py install
## Documentation
The documentation for `pandas_profiling` can be found [here](https://pandas-profiling.github.io/pandas-profiling/docs/).
### Getting started
Start by loading in your pandas DataFrame, e.g. by using
```python
import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport
df = pd.DataFrame(
np.random.rand(100, 5),
columns=['a', 'b', 'c', 'd', 'e']
)
```
To generate the report, run:
```python
profile = ProfileReport(df, title='Pandas Profiling Report', html={'style':{'full_width':True}})
```
#### Jupyter Notebook
We recommend generating reports interactively by using the Jupyter notebook.
There are two interfaces (see animations below): through widgets and through a HTML report.
This is achieved by simply displaying the report. In the Jupyter Notebook, run:
```python
profile
```
The HTML report can be included in a Juyter notebook:
Run the following code:
```python
profile.to_notebook_iframe()
```
#### Saving the report
If you want to generate a HTML report file, save the `ProfileReport` to an object and use the `to_file()` function:
```python
profile.to_file(output_file="your_report.html")
```
Alternatively, you can obtain the data as json:
```python
# As a string
json_data = profile.to_json()
# As a file
profile.to_file(output_file="your_report.json")
```
### Large datasets
Version 2.4 introduces minimal mode.
This is a default configuration that disables expensive computations (such as correlations and dynamic binning).
Use the following syntax:
```python
profile = ProfileReport(large_dataset, minimal=True)
profile.to_file(output_file="output.html")
```
### Command line usage
For standard formatted CSV files that can be read immediately by pandas, you can use the `pandas_profiling` executable. Run
pandas_profiling -h
for information about options and arguments.
### Advanced usage
A set of options is available in order to adapt the report generated.
* `title` (`str`): Title for the report ('Pandas Profiling Report' by default).
* `pool_size` (`int`): Number of workers in thread pool. When set to zero, it is set to the number of CPUs available (0 by default).
* `progress_bar` (`bool`): If True, `pandas-profiling` will display a progress bar.
More settings can be found in the [default configuration file](https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_default.yaml), [minimal configuration file](https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_minimal.yaml) and [dark themed configuration file](https://github.com/pandas-profiling/pandas-profiling/blob/master/src/pandas_profiling/config_dark.yaml).
__Example__
```python
profile = df.profile_report(title='Pandas Profiling Report', plot={'histogram': {'bins': 8}})
profile.to_file(output_file="output.html")
```
## Types
Types are a powerful abstraction for effective data analysis, that goes beyond the logical data types (integer, float etc.).
`pandas-profiling` currently recognizes the following types:
- Boolean
- Numerical
- Date
- Categorical
- URL
- Path
We have developed a type system for Python, tailored for data analysis: [visions](https://github.com/dylan-profiler/visions).
Selecting the right typeset drastically reduces the complexity the code of your analysis.
Future versions of `pandas-profiling` will have extended type support through `visions`!
## How to contribute
[](https://stackoverflow.com/questions/tagged/pandas-profiling)
The package is actively maintained and developed as open-source software.
If `pandas-profiling` was helpful or interesting to you, you might want to get involved.
There are several ways of contributing and helping our thousands of users.
If you would like to be a industry partner or sponsor, please [drop us a line](mailto:pandasprofiling@gmail.com).
The documentation is generated using [`pdoc3`](https://github.com/pdoc3/pdoc).
If you are contributing to this project, you can rebuild the documentation using:
```
make docs
```
or on Windows:
```
make.bat docs
```
Read more on getting involved in the [Contribution Guide](https://github.com/pandas-profiling/pandas-profiling/blob/master/CONTRIBUTING.md).
## Editor integration
### PyCharm integration
1. Install `pandas-profiling` via the instructions above
2. Locate your `pandas-profiling` executable.
On macOS / Linux / BSD:
```console
$ which pandas_profiling
(example) /usr/local/bin/pandas_profiling
```
On Windows:
```console
$ where pandas_profiling
(example) C:\ProgramData\Anaconda3\Scripts\pandas_profiling.exe
```
2. In Pycharm, go to _Settings_ (or _Preferences_ on macOS) > _Tools_ > _External tools_
3. Click the _+_ icon to add a new external tool
4. Insert the following values
- Name: Pandas Profiling
- Program: *__The location obtained in step 2__*
- Arguments: "$FilePath$" "$FileDir$/$FileNameWithoutAllExtensions$_report.html"
- Working Directory: $ProjectFileDir$
To use the PyCharm Integration, right click on any dataset file:
_External Tools_ > _Pandas Profiling_.
### Other integrations
Other editor integrations may be contributed via pull requests.
## Dependencies
The profile report is written in HTML and CSS, which means pandas-profiling requires a modern browser.
You need [Python 3](https://python3statement.org/) to run this package. Other dependencies can be found in the requirements files:
| Filename | Requirements|
|----------|-------------|
| [requirements.txt](https://github.com/pandas-profiling/pandas-profiling/blob/master/requirements.txt) | Package requirements|
| [requirements-dev.txt](https://github.com/pandas-profiling/pandas-profiling/blob/master/requirements-dev.txt) | Requirements for development|
| [requirements-test.txt](https://github.com/pandas-profiling/pandas-profiling/blob/master/requirements-test.txt) | Requirements for testing|
| [setup.py](https://github.com/pandas-profiling/pandas-profiling/blob/master/setup.py) | Requirements for Widgets etc. |