# ParGo **Repository Path**: ByteDance/ParGo ## Basic Information - **Project Name**: ParGo - **Description**: Official PyTorch Implementation of ParGo: Bridging Vision-Language with Partial and Global Views. (AAAI 2025) - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-01-17 - **Last Updated**: 2025-10-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ParGo: Bridging Vision-Language with Partial and Global Views Official PyTorch Implementation of ParGo: Bridging Vision-Language with Partial and Global Views. (AAAI 2025) [Paper](https://arxiv.org/abs/2408.12928), [Model](https://drive.google.com/file/d/1QdAF3Vv_oZjsfdpdyPezlMLUJ8IIFEiC/view?usp=drive_link) ![ParGo](assets/Pipeline.png) ## Setup ``` cd ParGo conda create -n ParGo_env python=3.10 -y conda activate ParGo_env pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118 pip install -r ./requirements.txt ``` ## Download Models The LLM(internlm2-7b) and vision_encoder(eva-clip-l-14-336) need to be downloaded in advance. ## Evaluation ### MME Benchmark #### Data You can place the benchmark data in the benchmark directory. Data structure: ``` ├── benchmarks │ ├── MMEBenmark │ └── images │ └── Data_json ``` Json file in Data_json contains the image_name, question, answer, e.g., ``` 10002.jpg Does this artwork exist in the form of painting? Yes ``` #### Evaluation Step 1: Generate the response: ``` python3 eval/eval_mme_finetuning.py --config ./configs/MMEBench_interLM2-7B.json ``` Step 2: Calculate the score: ``` python3 eval/calculation_mme.py --results_dir ./output/internlm2-MME ``` For other benchmarks, please follow their official instructions to construct the files; the overall pipeline is the same as evaluating in the MME benchmark. ## Acknowledgement This project is developed based on [MiniGPT](https://github.com/Vision-CAIR/MiniGPT-4/tree/main?tab=readme-ov-file) and [BLIP2](https://huggingface.co/docs/transformers/main/model_doc/blip-2). Very sincere thanks to the contributors to these excellent codebases. If you find our code helpful to your research, please consider citing us with this BibTeX: ``` @misc{wang2024pargobridgingvisionlanguagepartial, title={ParGo: Bridging Vision-Language with Partial and Global Views}, author={An-Lan Wang and Bin Shan and Wei Shi and Kun-Yu Lin and Xiang Fei and Guozhi Tang and Lei Liao and Jingqun Tang and Can Huang and Wei-Shi Zheng}, year={2024}, eprint={2408.12928}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2408.12928}, } ``` ## License The source code and pretrained weights are licensed under [BSD-3-Clause](https://spdx.org/licenses/BSD-3-Clause.html)