# RSGPT
**Repository Path**: a--designer/RSGPT
## Basic Information
- **Project Name**: RSGPT
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-07-31
- **Last Updated**: 2025-07-31
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
**RSGPT: A Remote Sensing Vision Language Model and Benchmark**
[Yuan Hu](https://scholar.google.com.sg/citations?user=NFRuz4kAAAAJ&hl=zh-CN), Jianlong Yuan, [Congcong Wen](https://wencc.xyz), Xiaonan Lu, [Xiang Li☨](https://xiangli.ac.cn)
☨corresponding author
This is an ongoing project. We are working on increasing the dataset size.
## Related Projects
**RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model**
[Congcong Wen*](https://wencc.xyz/), Yiting Lin*, Xiaokang Qu, Nan Li, Yong Liao, Hui Lin, [Xiang Li](https://xiangli.ac.cn)
**FedRSCLIP: Federated learning for remote sensing scene classification using vision-language models**
Hui Lin*, Chao Zhang*, Danfeng Hong, Kexin Dong, and [Congcong Wen☨](https://wencc.xyz)
**RS-MoE: A Vision–Language Model With Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering**
Hui Lin*, Danfeng Hong*, Shuhang Ge*, Chuyao Luo, Kai Jiang, Hao Jin, and [Congcong Wen☨](https://wencc.xyz)
**VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding**
Xiang Li, Jian Ding, Mohamed Elhoseiny
**Vision-language models in remote sensing: Current progress and future trends**
[Xiang Li*☨](https://xiangli.ac.cn), [Congcong Wen*](https://wencc.xyz/), [Yuan Hu*](https://scholar.google.com.sg/citations?user=NFRuz4kAAAAJ&hl=zh-CN), Zhenghang Yuan, [Xiao Xiang Zhu](https://www.professoren.tum.de/en/zhu-xiaoxiang)
**RS-CLIP: Zero Shot Remote Sensing Scene Classification via Contrastive Vision-Language Supervision**
[Xiang Li](https://xiangli.ac.cn), [Congcong Wen](https://wencc.xyz/), [Yuan Hu](https://scholar.google.com.sg/citations?user=NFRuz4kAAAAJ&hl=zh-CN), Nan Zhou
## :fire: Updates
* **[2025.05.08]** We release the code for training and testing RSGPT.
* **[2024.12.18]** We release the [manual scoring results](https://drive.google.com/file/d/1e3joLIiWfUgena17Dx8wZPWGNjs7vGua/view?usp=sharing) for RSIEval.
* **[2024.06.19]** We release the VRSBench, A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding. VRSBench contains 29,614 images, with 29,614 human-verified detailed captions, 52,472 object references, and 123,221 question-answer pairs. check [VRSBench Project Page](https://vrsbench.github.io/).
* **[2024.05.23]** We release the RSICap dataset. Please fill out this [form](https://docs.google.com/forms/d/1h5ydiswunM_EMfZZtyJjNiTMpeOzRwooXh73AOqokzU/edit) to get both RSICap and RSIEval dataset.
* **[2023.11.10]** Our survey about vision-language models in remote sensing. [RSVLM](https://arxiv.org/pdf/2305.05726.pdf).
* **[2023.10.22]** The RSICap dataset and code will be released upon paper acceptance.
* **[2023.10.22]** We release the evaluation dataset RSIEval. Please fill out this [form](https://docs.google.com/forms/d/1h5ydiswunM_EMfZZtyJjNiTMpeOzRwooXh73AOqokzU/edit) to get both the RSIEval dataset.
## Dataset
* RSICap: 2,585 image-text pairs with high-quality human-annotated captions.
* RSIEval: 100 high-quality human-annotated captions with 936 open-ended visual question-answer pairs.
## Code
The idea of finetuning our vision-language model is borrowed from [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4).
Our model is based on finetuning [InstructBLIP](https://github.com/salesforce/LAVIS/blob/main/projects/instructblip/README.md) using our RSICap dataset.
## 🚀 Installation
Set up a conda environment using the provided `environment.yml` file:
### Step 1: Create the environment
```
conda env create -f environment.yml
```
### Step 2: Activate the environment
```
conda activate rsgpt
```
## Training
```
torchrun --nproc_per_node=8 train.py --cfg-path train_configs/rsgpt_train.yaml
```
## Testing
Test image captioning:
```
python test.py --cfg-path eval_configs/rsgpt_eval.yaml --gpu-id 0 --out-path rsgpt/output --task ic
```
Test visual question answering:
```
python test.py --cfg-path eval_configs/rsgpt_eval.yaml --gpu-id 0 --out-path rsgpt/output --task vqa
```
## Acknowledgement
+ [MiniGPT-4](https://github.com/Vision-CAIR/MiniGPT-4). A popular open-source vision-language model.
+ [InstructBLIP](https://github.com/salesforce/LAVIS/blob/main/projects/instructblip/README.md). The model architecture of RSGPT follows InstructBLIP. Don't forget to check out this great open-source work if you don't know it before!
+ [Lavis](https://github.com/salesforce/LAVIS). This repository is built upon Lavis!
+ [Vicuna](https://github.com/lm-sys/FastChat). The fantastic language ability of Vicuna with only 13B parameters is just amazing. And it is open-source!
If you're using RSGPT in your research or applications, please cite using this BibTeX:
```bibtex
@article{hu2025rsgpt,
title={Rsgpt: A remote sensing vision language model and benchmark},
author={Hu, Yuan and Yuan, Jianlong and Wen, Congcong and Lu, Xiaonan and Liu, Yu and Li, Xiang},
journal={ISPRS Journal of Photogrammetry and Remote Sensing},
volume={224},
pages={272--286},
year={2025},
publisher={Elsevier}
}
```