# great_expectations
**Repository Path**: dawsongzhao/great_expectations
## Basic Information
- **Project Name**: great_expectations
- **Description**: Always know what to expect from your data.
docs.greatexpectations.io/
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: GE-160/GE-164/enhancement/alexsherstinsky/expect_column_min_max_to_be_between_fixture-2021_06_08-32
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-06-12
- **Last Updated**: 2021-06-12
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
[](https://dev.azure.com/great-expectations/great_expectations/_build/latest?definitionId=1&branchName=develop)

[](http://great-expectations.readthedocs.io/en/latest/?badge=latest)
Great Expectations
================================================================================
*Always know what to expect from your data.*
Introduction
--------------------------------------------------------------------------------
Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling.
Software developers have long known that testing and documentation are essential for managing complex codebases. Great Expectations brings the same confidence, integrity, and acceleration to data science and data engineering teams.
See [Down with Pipeline Debt!](https://medium.com/@expectgreatdata/down-with-pipeline-debt-introducing-great-expectations-862ddc46782a) for an introduction to the philosophy of pipeline testing.
Key features
--------------------------------------------------
### Expectations
Expectations are assertions for data. They are the workhorse abstraction in Great Expectations, covering all kinds of common data issues, including:
- `expect_column_values_to_not_be_null`
- `expect_column_values_to_match_regex`
- `expect_column_values_to_be_unique`
- `expect_column_values_to_match_strftime_format`
- `expect_table_row_count_to_be_between`
- `expect_column_median_to_be_between`
- ...and [many more](https://docs.greatexpectations.io/en/latest/reference/glossary_of_expectations.html)
Expectations are declarative, flexible and extensible.
### Batteries-included data validation
Expectations are a great start, but it takes more to get to production-ready data validation. Where are Expectations stored? How do they get updated? How do you securely connect to production data systems? How do you notify team members and triage when data validation fails?
Great Expectations supports all of these use cases out of the box. Instead of building these components for yourself over weeks or months, you will be able to add production-ready validation to your pipeline in a day. This “Expectations on rails” framework plays nice with other data engineering tools, respects your existing name spaces, and is designed for extensibility.

### Tests are docs and docs are tests
```diff
! This feature is in beta
```
Many data teams struggle to maintain up-to-date data documentation. Great Expectations solves this problem by rendering Expectations directly into clean, human-readable documentation.
Since docs are rendered from tests, and tests are run against new data as it arrives, your documentation is guaranteed to never go stale. Additional renderers allow Great Expectations to generate other type of "documentation", including slack notifications, data dictionaries, customized notebooks, etc.

### Automated data profiling
```diff
- This feature is experimental
```
Wouldn't it be great if your tests could write themselves? Run your data through one of Great Expectations' data profilers and it will automatically generate Expectations and data documentation. Profiling provides the double benefit of helping you explore data faster, and capturing knowledge for future documentation and testing.

Automated profiling doesn't replace domain expertise—you will almost certainly tune and augment your auto-generated Expectations over time—but it's a great way to jump start the process of capturing and sharing domain knowledge across your team.
### Pluggable and extensible
Every component of the framework is designed to be extensible: Expectations, storage, profilers, renderers for documentation, actions taken after validation, etc. This design choice gives a lot of creative freedom to developers working with Great Expectations.
Recent extensions include:
* [Renderers for data dictionaries](https://greatexpectations.io/blog/20200731_data_dictionary_plugin/)
* [BigQuery and GCS integration](https://github.com/great-expectations/great_expectations/pull/841)
* [Notifications to MatterMost](https://github.com/great-expectations/great_expectations/issues/902)
We're very excited to see what other plugins the data community comes up with!
Quick start
-------------------------------------------------------------
To see Great Expectations in action on your own data:
```
pip install great_expectations
great_expectations init
```
(We recommend deploying within a virtual environment. If you’re not familiar with pip, virtual environments, notebooks, or git, you may want to check out the [Supporting Resources](http://docs.greatexpectations.io/en/latest/reference/supporting_resources.html), which will teach you how to get up and running in minutes.)
For full documentation, visit [Great Expectations on readthedocs.io](http://great-expectations.readthedocs.io/en/latest/).
If you need help, hop into our [Slack channel](https://greatexpectations.io/slack)—there are always contributors and other users there.
Integrations
-------------------------------------------------------------------------------
Great Expectations works with the tools and systems that you're already using with your data, including:
| Integration | Notes | |
|---|---|---|
| Pandas | Great for in-memory machine learning pipelines! | |
| Spark | Good for really big data. | |
| Postgres | Leading open source database | |
| BigQuery | Google serverless massive-scale SQL analytics platform | |
| Databricks | Managed Spark Analytics Platform | |
| MySQL | Leading open source database | |
| AWS Redshift | Cloud-based data warehouse | |
| AWS S3 | Cloud based blob storage | |
| Snowflake | Cloud-based data warehouse | |
| Apache Airflow | An open source orchestration engine | |
| Other SQL Relational DBs | Most RDBMS are supported via SQLalchemy | |
| Jupyter Notebooks | The best way to build Expectations | |
| Slack | Get automatic data quality notifications! | |