# csv2 **Repository Path**: ruinco/csv2 ## Basic Information - **Project Name**: csv2 - **Description**: csv2 from github - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-06-08 - **Last Updated**: 2026-01-01 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

csv2

## Table of Contents * [CSV Reader](#csv-reader) * [Performance Benchmark](#performance-benchmark) * [Reader API](#reader-api) * [CSV Writer](#csv-writer) * [Writer API](#writer-api) * [Compiling Tests](#compiling-tests) * [Generating Single Header](#generating-single-header) * [Contributing](#contributing) * [License](#license) ## CSV Reader ```cpp #include int main() { csv2::Reader, csv2::quote_character<'"'>, csv2::first_row_is_header, csv2::trim_policy::trim_whitespace> csv; if (csv.mmap("foo.csv")) { const auto header = csv.header(); for (const auto row: csv) { for (const auto cell: row) { // Do something with cell value // std::string value; // cell.read_value(value); } } } } ``` ### Performance Benchmark This benchmark measures the average execution time (of 5 runs after 3 warmup runs) for `csv2` to memory-map the input CSV file and iterate over every cell in the CSV. See `benchmark/main.cpp` for more details. ```bash cd benchmark g++ -I../include -O3 -std=c++11 -o main main.cpp ./main ``` #### System Details | Type | Value | | --------------- | --------------------------------------------------------------------------------------------------------- | | Processor | 11th Gen Intel(R) Core(TM) i9-11900KF @ 3.50GHz 3.50 GHz | | Installed RAM | 32.0 GB (31.9 GB usable) | | SSD | [ADATA SX8200PNP](https://www.adata.com/upload/downloadfile/Datasheet_XPG%20SX8200%20Pro_EN_20181017.pdf) | | OS | Ubuntu 20.04 LTS running on WSL in Windows 11 | | C++ Compiler | g++ (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0 | #### Results (as of 23 SEP 2022) | Dataset | File Size | Rows | Cols | Time | |:--- | ---:| ---:| ---:| ---:| | [Denver Crime Data](https://www.kaggle.com/paultimothymooney/denver-crime-data) | 111 MB | 479,100 | 19 | 0.102s | | [AirBnb Paris Listings](https://www.kaggle.com/juliatb/airbnb-paris) | 196 MB | 141,730 | 96 | 0.170s | | [2015 Flight Delays and Cancellations](https://www.kaggle.com/usdot/flight-delays) | 574 MB | 5,819,079 | 31 | 0.603s | | [StackLite: Stack Overflow questions](https://www.kaggle.com/stackoverflow/stacklite) | 870 MB | 17,203,824 | 7 | 0.911s | | [Used Cars Dataset](https://www.kaggle.com/austinreese/craigslist-carstrucks-data) | 1.4 GB | 539,768 | 25 | 0.947s | | [Title-Based Semantic Subject Indexing](https://www.kaggle.com/hsrobo/titlebased-semantic-subject-indexing) | 3.7 GB | 12,834,026 | 4 |2.867s| | [Bitcoin tweets - 16M tweets](https://www.kaggle.com/alaix14/bitcoin-tweets-20160101-to-20190329) | 4 GB | 47,478,748 | 9 | 3.290s | | [DDoS Balanced Dataset](https://www.kaggle.com/devendra416/ddos-datasets) | 6.3 GB | 12,794,627 | 85 | 6.963s | | [Seattle Checkouts by Title](https://www.kaggle.com/city-of-seattle/seattle-checkouts-by-title) | 7.1 GB | 34,892,623 | 11 | 7.698s | | [SHA-1 password hash dump](https://www.kaggle.com/urvishramaiya/have-i-been-pwnd) | 11 GB | 2,62,974,241 | 2 | 10.775s | | [DOHUI NOH scaled_data](https://www.kaggle.com/seaa0612/scaled-data) | 16 GB | 496,782 | 3213 | 16.553s | ### Reader API Here is the public API available to you: ```cpp template , class quote_character = quote_character<'"'>, class first_row_is_header = first_row_is_header, class trim_policy = trim_policy::trim_whitespace> class Reader { public: // Use this if you'd like to mmap and read from file bool mmap(string_type filename); // Use this if you have the CSV contents in std::string already bool parse(string_type contents); // Shape size_t rows() const; size_t cols() const; // Row iterator // If first_row_is_header, row iteration will start // from the second row RowIterator begin() const; RowIterator end() const; // Access the first row of the CSV Row header() const; }; ``` Here's the `Row` class: ```cpp // Row class class Row { public: // Get raw contents of the row void read_raw_value(Container& value) const; // Cell iterator CellIterator begin() const; CellIterator end() const; }; ``` and here's the `Cell` class: ```cpp // Cell class class Cell { public: // Get raw contents of the cell void read_raw_value(Container& value) const; // Get converted contents of the cell // Handles escaped content, e.g., // """foo""" => ""foo"" void read_value(Container& value) const; }; ``` ## CSV Writer This library also provides a basic `csv2::Writer` class - one that can be used to write CSV rows to file. Here's a basic usage: ```cpp #include #include #include using namespace csv2; int main() { std::ofstream stream("foo.csv"); Writer> writer(stream); std::vector> rows = { {"a", "b", "c"}, {"1", "2", "3"}, {"4", "5", "6"} }; writer.write_rows(rows); stream.close(); } ``` ### Writer API Here is the public API available to you: ```cpp template > class Writer { public: // Construct using an std::ofstream Writer(output_file_stream stream); // Use this to write a single row to file void write_row(container_of_strings row); // Use this to write a list of rows to file void write_rows(container_of_rows rows); ``` ## Compiling Tests ```bash mkdir build && cd build cmake -DCSV2_BUILD_TESTS=ON .. make cd test ./csv2_test ``` ## Generating Single Header ```bash python3 utils/amalgamate/amalgamate.py -c single_include.json -s . ``` ## Contributing Contributions are welcome, have a look at the [CONTRIBUTING.md](CONTRIBUTING.md) document for more information. ## License The project is available under the [MIT](https://opensource.org/licenses/MIT) license.