# MLLMSeg **Repository Path**: ogw0725/MLLMSeg ## Basic Information - **Project Name**: MLLMSeg - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-01-13 - **Last Updated**: 2026-01-13 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # MLLMSeg: Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decoder [![hf_paper](https://img.shields.io/badge/πŸ€—-Paper%20In%20HF-red.svg)](https://huggingface.co/papers/2508.04107) [![arXiv](https://img.shields.io/badge/Arxiv-2508.04107-b31b1b.svg?logo=arXiv)](http://arxiv.org/abs/2508.04107) [![Python](https://img.shields.io/badge/Python-3.9-blue.svg)](https://www.python.org/downloads/) [![PyTorch](https://img.shields.io/badge/PyTorch-2.5.1-red.svg)](https://pytorch.org/) [![Transformers](https://img.shields.io/badge/Transformers-4.37.2-green.svg)](https://huggingface.co/docs/transformers/) ## πŸ“‹ Overview - [πŸš€ Quick Start](#-quick-start) - [πŸ“Š Performance Metrics](#-performance-metrics) - [πŸ‘€ Visualization](#-visualization) - [πŸ“¦ Checkpoints](#-checkpoints) - [🀝 Contributing](#-contributing) - [πŸ“„ License](#-license) - [πŸ™ Acknowledgments](#-acknowledgments) ## πŸ‘€ Todo - [ ] Release demo of MLLMSeg - [x] Release model checkpoints ## πŸš€ Quick Start ### Installation ```bash conda create -n mllmseg python==3.10.18 -y conda activate mllmseg pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118 # If you encounter any problems during the installation of datasets, please install this first. # conda install -c conda-forge pyarrow pip install -r requirements.txt pip install flash-attn==2.3.6 --no-build-isolation # Note: need gpu to install ``` ### Data Preparation Referring segmentation datasets: [refCOCO](https://web.archive.org/web/20220413011718/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip), [refCOCO+](https://web.archive.org/web/20220413011656/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco+.zip), [refCOCOg](https://web.archive.org/web/20220413012904/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcocog.zip), [refCLEF](https://web.archive.org/web/20220413011817/https://bvisionweb1.cs.unc.edu/licheng/referit/data/refclef.zip) ([saiapr_tc-12](https://web.archive.org/web/20220515000000/http://bvisionweb1.cs.unc.edu/licheng/referit/data/images/saiapr_tc-12.zip)) Generalized referring segmentation datasets: [gRefCOCO](https://github.com/henghuiding/gRefCOCO), we add the expressions and annotations json files into the `refer_seg` sub-directory, as shown in the tree structure below. ```angular2html |-- datasets β”‚ β”œβ”€β”€ refer_seg β”‚ β”‚ β”œβ”€β”€grefcoco β”‚ β”‚ β”œβ”€β”€ images β”‚ β”‚ | β”œβ”€β”€ saiapr_tc-12 β”‚ β”‚ | └── mscoco β”‚ β”‚ | └── images β”‚ β”‚ | └── train2014 β”‚ β”‚ β”œβ”€β”€ refclef β”‚ β”‚ β”œβ”€β”€ refcoco β”‚ β”‚ β”œβ”€β”€ refcoco+ β”‚ β”‚ └── refcocog ``` ### Model Training ```bash # Train RES bash scripts/train_mllmseg.sh # Train GRES bash scripts/train_mllmseg_gres.sh ``` ### Model Testing ```bash # Test RES bash scripts/test_mllmseg.sh # Test GRES bash scripts/test_mllmseg_gres.sh ``` ### Merge Lora ```bash # When training GRES, you need to first merge the LORA parameters of the RES model. python tools/merge_lora_mllmseg.py ``` ## πŸ“¦ Checkpoints Our checkpoints are available at [Baidu Netdisk](https://pan.baidu.com/s/1KzEsHkge47jwSRwAOQ98Xw?pwd=8ycs). You can also download it from HuggingFace: | Base Model | RES Model | GRES Model | |------|------|------| | InternVL2_5_1B | [MLLMSeg_InternVL2_5_1B_RES](https://huggingface.co/jcwang0602/MLLMSeg_InternVL2_5_1B_RES) | - | | InternVL2_5_2B | [MLLMSeg_InternVL2_5_2B_RES](https://huggingface.co/jcwang0602/MLLMSeg_InternVL2_5_2B_RES) | - | | InternVL2_5_4B | [MLLMSeg_InternVL2_5_4B_RES](https://huggingface.co/jcwang0602/MLLMSeg_InternVL2_5_4B_RES) | - | | InternVL2_5_8B | [MLLMSeg_InternVL2_5_8B_RES](https://huggingface.co/jcwang0602/MLLMSeg_InternVL2_5_8B_RES) | [MLLMSeg_InternVL2_5_8B_GRES](https://huggingface.co/jcwang0602/MLLMSeg_InternVL2_5_8B_GRES) | ## πŸ“Š Performance Metrics ### Referring Expression Segmentation ### Referring Expression Comprehension ### Generalized Referring Expression Segmentation ## πŸ‘€ Visualization ### Referring Expression Segmentation ### Referring Expression Comprehension ### Generalized Referring Expression Segmentation ## πŸ™ Acknowledgments This code is developed on the top of [InternVL](https://github.com/OpenGVLab/InternVL), [GSVA](https://github.com/LeapLabTHU/GSVA), and [EEVG](https://github.com/chenwei746/EEVG). ## βœ‰οΈ Contact Email: jcwang@stu.ecnu.edu.cn. Any kind discussions are welcomed! --- ## πŸ“– Citation If our work is useful for your research, please consider cite: ``` @misc{wang2025unlockingpotentialmllmsreferring, title={Unlocking the Potential of MLLMs in Referring Expression Segmentation via a Light-weight Mask Decoder}, author={Jingchao Wang and Zhijian Wu and Dingjiang Huang and Yefeng Zheng and Hong Wang}, year={2025}, eprint={2508.04107}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2508.04107}, } ``` ## ✨ Star History [![Star History Chart](https://api.star-history.com/svg?repos=jcwang0602/MLLMSeg&type=Date)](https://star-history.com/#jcwang0602/MLLMSeg&Date)