# polars **Repository Path**: hixuym/polars ## Basic Information - **Project Name**: polars - **Description**: Fast multi-threaded, hybrid-out-of-core DataFrame library in Rust | Python | Node.js - **Primary Language**: Rust - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2023-04-09 - **Last Updated**: 2023-04-09 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
Documentation: Python - Rust - Node.js | StackOverflow: Python - Rust - Node.js | User Guide | Discord
## Polars: Blazingly fast DataFrames in Rust, Python & Node.js Polars is a blazingly fast DataFrames library implemented in Rust using [Apache Arrow Columnar Format](https://arrow.apache.org/docs/format/Columnar.html) as the memory model. - Lazy | eager execution - Multi-threaded - SIMD - Query optimization - Powerful expression API - Hybrid Streaming (larger than RAM datasets) - Rust | Python | NodeJS | ... To learn more, read the [User Guide](https://pola-rs.github.io/polars-book/). ```python >>> import polars as pl >>> df = pl.DataFrame( ... { ... "A": [1, 2, 3, 4, 5], ... "fruits": ["banana", "banana", "apple", "apple", "banana"], ... "B": [5, 4, 3, 2, 1], ... "cars": ["beetle", "audi", "beetle", "beetle", "beetle"], ... } ... ) # embarrassingly parallel execution & very expressive query language >>> df.sort("fruits").select( ... "fruits", ... "cars", ... pl.lit("fruits").alias("literal_string_fruits"), ... pl.col("B").filter(pl.col("cars") == "beetle").sum(), ... pl.col("A").filter(pl.col("B") > 2).sum().over("cars").alias("sum_A_by_cars"), ... pl.col("A").sum().over("fruits").alias("sum_A_by_fruits"), ... pl.col("A").reverse().over("fruits").alias("rev_A_by_fruits"), ... pl.col("A").sort_by("B").over("fruits").alias("sort_A_by_B_by_fruits"), ... ) shape: (5, 8) ┌──────────┬──────────┬──────────────┬─────┬─────────────┬─────────────┬─────────────┬─────────────┐ │ fruits ┆ cars ┆ literal_stri ┆ B ┆ sum_A_by_ca ┆ sum_A_by_fr ┆ rev_A_by_fr ┆ sort_A_by_B │ │ --- ┆ --- ┆ ng_fruits ┆ --- ┆ rs ┆ uits ┆ uits ┆ _by_fruits │ │ str ┆ str ┆ --- ┆ i64 ┆ --- ┆ --- ┆ --- ┆ --- │ │ ┆ ┆ str ┆ ┆ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞══════════╪══════════╪══════════════╪═════╪═════════════╪═════════════╪═════════════╪═════════════╡ │ "apple" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 7 ┆ 4 ┆ 4 │ │ "apple" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 7 ┆ 3 ┆ 3 │ │ "banana" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 8 ┆ 5 ┆ 5 │ │ "banana" ┆ "audi" ┆ "fruits" ┆ 11 ┆ 2 ┆ 8 ┆ 2 ┆ 2 │ │ "banana" ┆ "beetle" ┆ "fruits" ┆ 11 ┆ 4 ┆ 8 ┆ 1 ┆ 1 │ └──────────┴──────────┴──────────────┴─────┴─────────────┴─────────────┴─────────────┴─────────────┘ ``` ## Performance 🚀🚀 ### Blazingly fast Polars is very fast. In fact, it is one of the best performing solutions available. See the results in [h2oai's db-benchmark](https://h2oai.github.io/db-benchmark/). In the [TPCH benchmarks](https://www.pola.rs/benchmarks.html) polars is orders of magnitudes faster than pandas, dask, modin and vaex on full queries (including IO). ### Lightweight Polars is also very lightweight. It comes with zero required dependencies, and this shows in the import times: - polars: 70ms - numpy: 104ms - pandas: 520ms ### Handles larger than RAM data If you have data that does not fit into memory, polars lazy is able to process your query (or parts of your query) in a streaming fashion, this drastically reduces memory requirements so you might be able to process your 250GB dataset on your laptop. Collect with `collect(streaming=True)` to run the query streaming. (This might be a little slower, but it is still very fast!) ## Setup ### Python Install the latest polars version with: ```sh pip install polars ``` We also have a conda package (`conda install -c conda-forge polars`), however pip is the preferred way to install Polars. Install Polars with all optional dependencies. ```sh pip install 'polars[all]' pip install 'polars[numpy,pandas,pyarrow]' # install a subset of all optional dependencies ``` You can also install the dependencies directly. | Tag | Description | | ---------- | ------------------------------------------------------------------------------------------------------------------------------------- | | all | Install all optional dependencies (all of the following) | | pandas | Install with Pandas for converting data to and from Pandas Dataframes/Series | | numpy | Install with numpy for converting data to and from numpy arrays | | pyarrow | Reading data formats using PyArrow | | fsspec | Support for reading from remote file systems | | connectorx | Support for reading from SQL databases | | xlsx2csv | Support for reading from Excel files | | deltalake | Support for reading from Delta Lake Tables | | timezone | Timezone support, only needed if 1. you are on Python < 3.9 and/or 2. you are on Windows, otherwise no dependencies will be installed | Releases happen quite often (weekly / every few days) at the moment, so updating polars regularly to get the latest bugfixes / features might not be a bad idea. ### Rust You can take latest release from `crates.io`, or if you want to use the latest features / performance improvements point to the `master` branch of this repo. ```toml polars = { git = "https://github.com/pola-rs/polars", rev = "