# ChartReader **Repository Path**: esheeper/ChartReader ## Basic Information - **Project Name**: ChartReader - **Description**: 文档图表识别 - **Primary Language**: Python - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-02-28 - **Last Updated**: 2024-03-23 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ChartLLM: Unlocking the Multimodal Potential of LLMs for Chart Comprehension ## Highlights - 🏅 Easily handle unseen chart types by simply adding more training data - 🎊 Transformer architecture automatically infers rules from center/key points - 🏵️ Unified framework for all chart and table understanding tasks Our solution first uses a specialized detection module built on Multi-Scale [Hourglass Networks](https://arxiv.org/abs/1603.06937) to locate and segment chart components like axes, legends and plot areas in a unified manner without hardcoded assumptions. We then employ a structured Transformer encoder to capture spatial, semantic and stylistic relationships between the detected components. This allows grouping relevant elements into a structured tabular intermediate representation of the chart layout and content. Finally, we fine-tune the state-of-the-art [T5](https://arxiv.org/abs/1910.10683) text-to-text transformer on this representation using special tokens to associate chart details with free-form questions across a variety of analytical tasks. ![Example Interface](./example_interface.png) A simple UI interface is also available for demostrating our model, please visit : ## Quickstart ### System Requirements - GPU-enabled machine - CUDA toolkit version = 10.0 - Python 3.11 - GCC 4.9.2 or above ### Installation Ensure [Anaconda](https://anaconda.org) is installed on your system. Use the provided package list to create an Anaconda environment: ```shell conda env create -f requirements.yaml conda activate ChartLLM ``` The Microsoft COCO APIs are required for the functioning of data loading part of the chart data extraction part, given that the original EC400K dataset is in COCO format. ```shell mkdir data cd data git clone https://github.com/cocodataset/cocoapi coco cd coco/PythonAPI make ``` ### Data Preparation Download the modified EC400K dataset from this [link](https://pan.baidu.com/s/1myO8-SAmLa5NVsHzmBSC5w?pwd=54tb) > The annotation contains three parts. > The first part `images` containes the chart image information, which has 4 labels for each image: `file_name`, `width`, `height`, and `id`. > The second part `annotations` contains the chart components annotation information, which has 5 labels for each annotation: `image_id`, `category_id`, `bbox`, `area`, `id`. > - `image_id`: the `id` of chart image which the annotation belongs to > - `category_id`: the type of the component, where 1 denotes bars in vbar_categorical charts, 2 denotes lines in line charts, 3 denotes pies in pie charts, 4 denotes the legends, 5 denotes the title of the values axes, 6 denotes the title of the entire chart, 7 denotes the title of the category axes. > - `bbox`: the points showing the bounding box of the component. For lines in line charts, they are the data points for a line (`[d_1_x, d_1_y, …., d_n_x, d_n_y]`). For pies in pie charts, they are the three critical points for a sector of the pie `[center_x, center_y, edge_1_x, edge_1_y, edge_2_x, edge_2_y]`. For bars in vbar_categorical charts, and other types of components, they are the x-coordinate of the top-left corner of the box, the y-coordinate of the top-left corner of the box, the width of the box (horizontal dimension), and the height of the box (vertical dimension). > - `area`: the area of the chart component. > - `id`: the unique identifier of each annotation. > The third part `categories` provide a further reference of the `categories` in the `annotations` part with 3 labels for each component category: `supercategory`, `id`, and `name`. In which categories 1 to 3 belong to supercategory "MainComponent" and other categories belong to supercategory "OtherComponents". ## Training ### Chart Data Extraction Part The configuration files `KPDetection` for keypoint detection and `KPGrouping` for keypoint detection and grouping are in JSON format and located in `config/`. To train the chart data extraction model, use the `train_extraction.py` script. You should first train the KP Detection model, for example: ```shell python train_extraction.py \ --cfg_file KPDetection \ --data_dir "/root/autodl-tmp/bar/" \ --cache_path "/root/autodl-tmp/cache/" ``` Then you can use the pretrained KP Detection model to train the KP grouping model, for example: ```shell python train_extraction.py \ --cfg_file KPGrouping \ --data_dir "/root/autodl-tmp/component_data/" \ --pretrained_model "KPDetection_best.pkl" \ --cache_path "/root/autodl-tmp/cache/" ``` ### Chart Question Answering Part Execute the command below, ensuring to replace placeholder paths and adjust hyperparameters as necessary: ```shell torchrun \ --nproc_per_node=1 \ run_t5.py \ --model_name_or_path=t5-base \ --do_train \ --do_eval \ --do_predict \ --train_file="/root/autodl-tmp/qa_data/train.csv" \ --validation_file="/root/autodl-tmp/qa_data/val.csv" \ --test_file="/root/autodl-tmp/qa_data/test.csv" \ --text_column=Input \ --summary_column=Output \ --source_prefix="" \ --output_dir="/root/autodl-tmp/cache/t5_output" \ --per_device_train_batch_size=8 \ --per_device_eval_batch_size=16 \ --predict_with_generate=True \ --learning_rate=0.0001 \ --num_beams=4 \ --num_train_epochs=30 \ --save_steps=10000 \ --eval_steps=2000 \ --evaluation_strategy=steps \ --load_best_model \ --max_source_length=1024 ``` ## Evaluation To use the demo UI interface, use the `demo.py` script, ensure you have replaced all the directories in the script with correct values. To test the extraction of data directly, use the `val_extraction.py` script: e.g. ```shell python val_extraction.py \ --save_path evaluation \ --model_type KPGrouping \ --cache_path "/root/autodl-tmp/cache/" \ --data_dir "/root/autodl-tmp/component_data" \ --trained_model_iter "best" ``` ## Acknowledgments I am deeply grateful to: - **Dr. Zhi-Qi Cheng** from Carnegie Mellon University. His expertise and insights were invaluable in shaping my research. - **The Center for High-Performance Computing at Shanghai Jiao Tong University**, which provided the computational resources for this paper. - **The S.T. Yau Science Award** for providing me with an incredible opportunity and platform to showcase my research.