# MCITlib **Repository Path**: hawksilent/MCITlib ## Basic Information - **Project Name**: MCITlib - **Description**: MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main-v2 - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-10 - **Last Updated**: 2026-01-10 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark

✨Introduction • 🥇 Methods Provided • 🏦 Benchmarks • 🎨 Models
🏃 How to run • 🤝 Acknowledgments • 🙂 Contact

If you like our project, please give us a star ⭐ on GitHub for latest update.

[![📑 Paper (arXiv:2508.07307)](https://img.shields.io/badge/arXiv-2508.07307-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2508.07307) [![hf_space](https://img.shields.io/badge/🤗-Open%20In%20Spaces-blue.svg)](https://huggingface.co/MLLM-CL) [![zhihu](https://img.shields.io/badge/-WeChat@机器之心-000000?logo=wechat&logoColor=07C160)](https://mp.weixin.qq.com/s/FBZw95e_0WibVbV075OyCA) [![zhihu](https://img.shields.io/badge/-WeChat@PaperWeekly-000000?logo=wechat&logoColor=07C160)](https://mp.weixin.qq.com/s/8xK7exmEAyDfBzFvvxugig) [![zhihu](https://img.shields.io/badge/-知乎-000000?logo=zhihu&logoColor=0084FF)](https://zhuanlan.zhihu.com/p/1947312085248746812)

## ✨ Introduction Welcome to **MCITlib** — your ultimate library for **continual instruction tuning** with **multimodal large language models**. MCITlib brings together a diverse set of cutting-edge methods into a unified, easy-to-use framework. Beyond method integration, MCITlib offers **comprehensive evaluation results** across a variety of benchmarks and model architectures, empowering researchers and practitioners to explore and innovate in this exciting field. **Why choose MCITlib？** - 🚀 **Pioneering Open Source:** We are proud to be the **first open-source repository** to provide a complete codebase and benchmark suite dedicated to multimodal continual instruction tuning. - 🌟 **Beginner-Friendly Design:** MCITlib is designed with usability in mind, offering **clear, step-by-step guidance** to help newcomers quickly get started and make meaningful progress. - 🔄 **Continuous Innovation:** Our commitment doesn’t stop here. We will **regularly update MCITlib**, integrating new methods and benchmarks to stay at the forefront of the field and provide lasting value to the community. Whether you're a beginner seeking to learn or an expert aiming to innovate, **MCITlib** is your gateway to advancing multimodal continual instruction tuning research. Join us and contribute to shaping the future of this rapidly evolving domain!

🫰 We also have other multimodal continual instruction tuning projects that may interest you 🫰.

> [**HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model**](https://arxiv.org/pdf/2503.12941)
> Haiyang Guo, Fanhu Zeng, Ziwei Xiang, Fei Zhu, Da-Han Wang, Xu-Yao Zhang, Cheng-Lin Liu
[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/Ghy0501/HiDe-LLaVA) [![arXiv](https://img.shields.io/badge/Arxiv-2503.12941-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2503.12941) ![](https://img.shields.io/badge/ACL-2025-blue)
> [**Federated Continual Instruction Tuning**](https://arxiv.org/pdf/2503.12897)
> Haiyang Guo, Fanhu Zeng, Fei Zhu, Wenzhuo Liu, Da-Han Wang, Jian Xu, Xu-Yao Zhang, Cheng-Lin Liu
[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/Ghy0501/FCIT) [![arXiv](https://img.shields.io/badge/Arxiv-2503.12897-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2503.12897) ![](https://img.shields.io/badge/ICCV-2025-blue)
> [**ModalPrompt: Towards Efficient Multimodal Continual Instruction Tuning with Dual-Modality Guided Prompt**](https://arxiv.org/pdf/2410.05849)
> Fanhu Zeng, Fei Zhu, Haiyang Guo, Xu-Yao Zhang, Cheng-Lin Liu
[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/AuroraZengfh/ModalPrompt) [![arXiv](https://img.shields.io/badge/Arxiv-2410.05849-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2410.05849) ![](https://img.shields.io/badge/EMNLP-2025-blue)
> [**Continual Learning for Generative AI: From LLMs to MLLMs and Beyond**](https://arxiv.org/pdf/2506.13045)
> Haiyang Guo, Fanhu Zeng, Fei Zhu, Jiayi Wang, Xukai Wang, Jingang Zhou, Hongbo Zhao,
Wenzhuo Liu, Shijie Ma, Da-Han Wang, Xu-Yao Zhang, Cheng-Lin Liu
[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/Ghy0501/Awesome-Continual-Learning-in-Generative-Models) [![arXiv](https://img.shields.io/badge/Arxiv-2506.13045-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2506.13045)
> [**MLLM-CL: Continual Learning for Multimodal Large Language Models**](https://arxiv.org/pdf/2506.05453)
> Hongbo Zhao, Fei Zhu, Haiyang Guo, Meng Wang, Rundong Wang, Gaofeng Meng, Zhaoxiang Zhang
[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/bjzhb666/MLLM-CL) [![arXiv](https://img.shields.io/badge/Arxiv-2506.05453-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2506.05453)
> [**LLaVA-c: Continual Improved Visual Instruction Tuning**](https://arxiv.org/pdf/2506.08666?)
> Wenzhuo Liu, Fei Zhu, Haiyang Guo, Longhui Wei, Cheng-Lin Liu
[![arXiv](https://img.shields.io/badge/Arxiv-2506.08666-b31b1b.svg?logo=arXiv)](https://arxiv.org/pdf/2506.08666?)

## 📰 News - **[2026.1.2]** 🔥🔥🔥 We have updated the paper in [MCITlib](https://arxiv.org/pdf/2508.07307) with the latest results. Please feel free to check it out. 🎉🎉🎉 - **[2025.10.14]** 🔥🔥🔥 **MCITlib-v2** has been updated! The latest version includes training and testing code for **8 mainstream multimodal continual instruction tuning methods**, compatible with **2 base models** and **3 continual instruction tuning datasets**. 🎉🎉🎉 - **[2025.09.16]** We have updated the new version of the [paper](https://arxiv.org/pdf/2508.07307) and attached the accuracy matrix of each method for reference. :tada: - **[2025.08.12]** Initial [MCITlib](https://arxiv.org/pdf/2508.07307) paper released! :tada: - **[2025.08.10]** Initial version of MCITlib is released. :tada: ## 🥇 Methods Provided - `LoRA-FT`: Baseline method which simply updates LoRA parameters on new tasks. [[Paper]](https://arxiv.org/pdf/2106.09685v1/1000) ![](https://img.shields.io/badge/ICLR-2022-blue) - `O-LoRA`: Orthogonal subspace learning for language model continual learning. [[Paper]](https://arxiv.org/pdf/2310.14152) ![](https://img.shields.io/badge/EMNLP_findings-2023-blue) - `MoELoRA`: CoIN: A Benchmark of Continual Instruction Tuning for Multimodal Large Language Models [[Paper]](https://proceedings.neurips.cc/paper_files/paper/2024/file/6a45500d9eda640deed90d8a62742be5-Paper-Datasets_and_Benchmarks_Track.pdf) ![](https://img.shields.io/badge/NeurIPS-2024-blue) - `ModalPrompt`: ModalPrompt: Dual-Modality Guided Prompt for Continual Learning of Large Multimodal Models [[Paper]](https://arxiv.org/pdf/2410.05849) ![](https://img.shields.io/badge/EMNLP-2025-blue) - `CL-MoE`: CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering [[Paper]](https://arxiv.org/pdf/2503.00413?) ![](https://img.shields.io/badge/CVPR-2025-blue) - `HiDe`: HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model [[Paper]](https://arxiv.org/pdf/2503.12941?) ![](https://img.shields.io/badge/ACL-2025-blue) - `SEFE`: SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning [[Paper]](https://arxiv.org/pdf/2505.02486?) ![](https://img.shields.io/badge/ICML-2025-blue) - `DISCO`: Federated Continual Instruction Tuning [[Paper]](https://arxiv.org/pdf/2503.12897) ![](https://img.shields.io/badge/ICCV-2025-blue) ## 🏦 Benchmarks We currently report results on the [UCIT](https://github.com/Ghy0501/HiDe-LLaVA), [MLLM-DCL](https://github.com/bjzhb666/MLLM-CL) and [MLLM-ACL](https://github.com/bjzhb666/MLLM-CL) benchmarks. Please refer to the provided links to download the corresponding images and instruction sets, and organize them in the following directory structure: ``` |--your_path |-- Domain_data |-- AD |-- Med |-- RS |-- Sci |-- Fin |-- Ability_data |-- OCR |-- OCR_test |-- Math |-- Math_test |-- APP |-- APP_test |-- VP |-- VP_test |-- UCIT |-- datasets |-- ArxivQA |-- CLEVR-Math |-- Flickr30k |-- IconQA |-- ImageNet-R |-- VizWiz ``` Note: You need to modify the data path in all the scripts to your own path. ## 🎨 Models We currently provide a reproduction based on the [LLaVA-1.5-7B](https://github.com/haotian-liu/LLaVA) and [InternVL-Chat-7B](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat_llava) model. Please download it to your local directory. ``` huggingface-cli download liuhaotian/llava-v1.5-7b --local-dir /your_path/llava-v1.5-7b huggingface-cli download openai/clip-vit-large-patch14-336 --local-dir /your_path/clip-vit-large-patch14-336 huggingface-cli download OpenGVLab/InternVL-Chat-ViT-6B-Vicuna-7B --local-dir /your_path/Internvl-chat-7b huggingface-cli download OpenGVLab/InternViT-6B-224px --local-dir /your_path/InternViT-6B-224px ``` We also plan to extend our reproduction to other MLLM architectures in the near future. Note: To meet the requirements of certain methods, we need to apply additional processing to the config file in the downloaded model. The details are outlined below: 1. add `"mm_text_select_layer": -1` and `"mm_text_tower": "/your_path/clip-vit-large-patch14-336"` to the `config.py` in your local model weight path `/your_path/llava-v1.5-7b` and `/your_path/Internvl-chat-7b`. 2. remove `"temperature": 0.9` and `"top_p": 0.6` in the `generation_config.json` of your local model weight path. We provide reference `config.py` and `generation_config.json` in `examples`. ## 🏃 How to run Note: Our experiment is conducted in a CUDA 11.8 environment, with most libraries in the setup aligned to this CUDA version. Therefore, we recommend using `nvcc -V` to check the CUDA version on your current server. If it does not match, please install CUDA 11.8 before proceeding. ### 1. Clone this repository ``` git clone https://github.com/Ghy0501/MCITlib.git cd MCITlib ``` ### 2. Install Package ``` conda create -n MCITlib python=3.10 -y conda activate MCITlib conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia cd LLaVA/LoRA-FT pip install --upgrade pip pip install -e . pip install -e ".[train]" ``` For installing [flash-attn](https://github.com/Dao-AILab/flash-attention/releases), we recommend downloading version 2.6.3 from the official repository according to your CUDA and PyTorch versions, and placing it in a local directory for manual installation. For example: ``` pip install flash_attn-2.6.3+cu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl ``` We also provide an `environment.yml` file to help users identify missing dependencies and version mismatches. However, due to potential library conflicts, automatic installation may fail to install certain packages. We therefore recommend manually installing them based on the provided error messages and version specifications. For essential evaluation-related dependencies, please refer to the [UCIT](https://github.com/Ghy0501/HiDe-LLaVA) and [MLLM-CL](https://github.com/bjzhb666/MLLM-CL) repositories. ### 3. Modify path and parameter settings Before running, please set all the model paths to your local paths. The paths that need to be modified are listed below, and don’t forget to update the dataset path as well. - Change `/mnt/haiyangguo/mywork/CL-MLLM/MCITlib_v2` to `/your_path/MCITlib`. - Change `/mnt/haiyangguo/mywork/FCIT/pre_trained/llava-v1.5-7b` to `/your_path/llava-v1.5-7b`. - Change `/mnt/haiyangguo/mywork/CL-MLLM/pre_trained/Internvl-chat-7b` to `/your_path/Internvl-chat-7b`. - Change `/mnt/ShareDB_6TB/models/clip-vit-large-patch14-336` to `/your_path/clip-vit-large-patch14-336`. - Change `/mnt/ShareDB_6TB/models/InternViT-6B-224px` to `/your_path/InternViT-6B-224px`. - Change `/mnt/ShareDB_6TB/datasets/MLLM_CL/checkpoint` to `/your_path/checkpoint`. After adjusting the path, users can modify parameters like `gpu_num` based on their actual operating environment. All parameter settings are integrated into the `configs/` folder. Note: We recommend using the `Find in Folder` command in VS Code for search and replace operations. ### 4. Training and Evaluation We provide predefined training and testing hyperparameters in the `configs` files within each method's directory, which can be adjusted as needed. The corresponding training and testing scripts are located in the `scripts` directory. Once all paths are correctly configured, the scripts should execute without issues. For example: ``` cd LLaVA/LoRA-FT sh scripts/MCITlib/Train/train_DCL.sh ``` The program will automatically perform both training and inference. However, for ModalPrompt (LLaVA version), training and inference must be executed separately. Please refer to its [repository](https://github.com/AuroraZengfh/ModalPrompt) for detailed instructions. ## Citation ```bibtex @article{guo2025mcitlib, title={MCITlib: Multimodal Continual Instruction Tuning Library and Benchmark}, author={Guo, Haiyang and Zhu, Fei and Zhao, Hongbo and Zeng, Fanhu and Liu, Wenzhuo and Ma, Shijie and Wang, Da-Han and Zhang, Xu-Yao}, journal={arXiv preprint arXiv:2508.07307}, year={2025} } ``` ## 🤝 Acknowledgments We thank the following repos providing helpful functions in our work. - [LLaVA](https://github.com/haotian-liu/LLaVA) - [CoIN](https://github.com/zackschen/CoIN) - [O-LoRA](https://github.com/cmnfriend/O-LoRA) - [ModalPrompt](https://github.com/AuroraZengfh/ModalPrompt) - [CL-MoE](https://github.com/ECNU-ICALK/CL-MoE) - [HiDe-LLaVA](https://github.com/Ghy0501/HiDe-LLaVA) - [SEFE](https://github.com/jinpeng0528/SEFE) - [FCIT](https://github.com/Ghy0501/FCIT) ## 🙂 Contact If you have any questions or suggestions for new features, please open an issue or contact the author, Haiyang Guo (guohaiyang2023@ia.ac.cn).