# HealthGPT **Repository Path**: jiangzhijie628/HealthGPT ## Basic Information - **Project Name**: HealthGPT - **Description**: No description available - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 1 - **Created**: 2025-09-17 - **Last Updated**: 2025-09-18 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

HealthGPT : 通过异构知识适配统一理解与生成的医疗大型视觉语言模型

林天伟¹, 张文俏¹, 李思婧¹, 袁雨倩¹, 余彬河², 李浩源³, 何望贵³, 蒋浩³, 李孟泽⁴, 宋晓辉¹, 唐思亮¹, 肖俊¹, 林辉¹, 庄越挺¹, 黄铭钧⁵

¹浙江大学, ²电子科技大学, ³阿里巴巴, ⁴香港科技大学, ⁵新加坡国立大学

## 🌟 概述欢迎使用 **HealthGPT!** 🚀 **HealthGPT** 是一个先进的医疗大型视觉语言模型，具有统一框架，集成了医疗视觉理解和生成能力。在本项目中，提出了**异构低秩适应（H-LoRA）**和**三阶段学习策略**，使预训练的大型语言模型能够有效地遵循视觉理解和生成指令。 # 🔥 新闻 - **[2025.03.20]** 我们升级了我们的专用理解模型，[**HealthGPT-XL32**](https://huggingface.co/lintw/HealthGPT-XL32)，该模型基于 Qwen2.5-32B-Instruct。**这个增强模型显著优于 HealthGPT-L14，得分为 70.4，而后者为 66.4**。 - **[2025.03.06]** 我们已经发布了 VL-Health 数据集。 - **[2025.02.26]** 我们已经发布了推理的 UI/UX。 - **[2025.02.17]** 我们已经在 HuggingFace 上发布了预训练权重和推理脚本。 ### 待办事项 - [x] 发布推理代码。 - [x] 发布模型的预训练权重。 - [x] 发布推理 UI/UX。 - [x] 发布 VL-Health 数据集。 - [ ] 发布训练脚本。 - [ ] 构建网站。 ### 📚 任务分类和支持 **HealthGPT** 支持 **7** 种医疗理解任务和 **5** 种医疗生成任务，优于最近的统一视觉模型和医疗特定模型。

Example Image

### 🏗️ 架构 HealthGPT 架构集成了**分层视觉感知**和**H-LoRA**，采用任务特定的硬路由器来选择视觉特征和 H-LoRA 插件，以自回归方式生成文本和视觉输出。

Example Image

## 🛠️ 入门指南我们发布了两种配置的模型，**HealthGPT-M3** 和 **HealthGPT-L14**，以满足不同要求和资源可用性： - HealthGPT-M3：针对速度和减少内存使用而优化的较小版本。 - HealthGPT-L14：为更高性能和更复杂任务设计的较大版本。 ### 安装 **1. 准备环境** 首先，克隆我们的存储库并使用以下命令创建运行 HealthGPT 的 Python 环境： ``` # 克隆我们的项目 git clone https://github.com/DCDmllm/HealthGPT.git cd HealthGPT # 准备python环境 conda create -n HealthGPT python=3.10 conda activate HealthGPT pip install -r requirements.txt ``` **2. 准备预训练权重** HealthGPT 使用`clip-vit-large-patch14-336`作为视觉编码器，并分别使用`Phi-3-mini-4k-instruct`和`phi-4`作为 HealthGPT-M3 和 HealthGPT-L14 的预训练 LLM 基础模型。请下载相应的权重： |模型类型|模型名称|下载| |:-:|:-:|:-:| |ViT|`clip-vit-large-patch14-336`|[下载](https://huggingface.co/openai/clip-vit-large-patch14-336)| |基础模型 (HealthGPT-M3)|`Phi-3-mini-4k-instruct`|[下载](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)| |基础模型 (HealthGPT-L14)|`Phi-4`|[下载](https://huggingface.co/microsoft/phi-4)| |基础模型 (Qwen2.5-32B-Instruct)|`Qwen2.5-32B-Instruct `|[下载](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)| 对于医疗视觉生成任务，请遵循[官方 VQGAN 指南](https://github.com/CompVis/taming-transformers)并从`"预训练模型概述"`部分下载`VQGAN OpenImages (f=8), 8192`模型权重。以下是相应 VQGAN 预训练权重的直接链接： |模型名称|下载| |:-:|:-:| |VQGAN OpenImages (f=8), 8192, GumbelQuantization|[下载](https://heibox.uni-heidelberg.de/d/2e5662443a6b4307b470/?p=%2F&mode=list)| 下载后，将`last.ckpt`和`model.yaml`文件放在`taming_transformers/ckpt`目录中。 **3. 准备 H-LoRA 和适配器权重** HealthGPT 通过训练少量 H-LoRA 参数和用于对齐视觉和文本的适配器层来增强基础模型的医疗视觉理解和生成能力。我们目前已经发布了训练过程中的一些权重，支持`医疗视觉问答`和`开放世界视觉重建`任务。以下是相应的权重：[下载](https://huggingface.co/lintw/HealthGPT-M3)。我们很快将发布 HealthGPT-L14 的完整权重，以及医疗生成任务的 H-LoRA 权重。敬请期待！！！ ## ⚡ 推理 ### 医疗视觉问答要使用 HealthGPT 执行推理，请按照以下步骤操作： 1. 下载必要文件： - 确保您已下载所有必需的模型权重和资源。 2. 更新脚本路径： - 打开位于`llava/demo/com_infer.sh`的脚本。 - 修改以下变量以指向您存储下载文件的路径： - MODEL_NAME_OR_PATH：基础模型的路径或标识符。 - VIT_PATH：Vision Transformer 模型权重的路径。 - HLORA_PATH：视觉理解的[HLORA 权重](https://huggingface.co/lintw/HealthGPT-M3/blob/main/com_hlora_weights.bin)文件的路径。 - FUSION_LAYER_PATH：您的[融合层权重](https://huggingface.co/lintw/HealthGPT-M3/blob/main/fusion_layer_weights.bin)文件的路径。 3. 运行脚本： - 在终端中执行脚本以开始推理： ``` cd llava/demo bash com_infer.sh ``` 您可以通过在终端中指定路径和参数来直接运行 Python 命令。这种方法允许您根据需要轻松更改图像或问题： ``` python3 com_infer.py \ --model_name_or_path "microsoft/Phi-3-mini-4k-instruct" \ --dtype "FP16" \ --hlora_r "64" \ --hlora_alpha "128" \ --hlora_nums "4" \ --vq_idx_nums "8192" \ --instruct_template "phi3_instruct" \ --vit_path "openai/clip-vit-large-patch14-336/" \ --hlora_path "path/to/your/local/com_hlora_weights.bin" \ --fusion_layer_path "path/to/your/local/fusion_layer_weights.bin" \ --question "您的问题" \ --img_path "path/to/image.jpg" ``` - 自定义问题和图像：您可以修改`--question`和`--img_path`参数来提出不同问题或分析不同图像。相应地，`HealthGPT-L14`的视觉问答任务可以使用以下 Python 命令执行： ``` python3 com_infer_phi4.py \ --model_name_or_path "microsoft/Phi-4" \ --dtype "FP16" \ --hlora_r "32" \ --hlora_alpha "64" \ --hlora_nums "4" \ --vq_idx_nums "8192" \ --instruct_template "phi4_instruct" \ --vit_path "openai/clip-vit-large-patch14-336/" \ --hlora_path "path/to/your/local/com_hlora_weights_phi4.bin" \ --question "您的问题" \ --img_path "path/to/image.jpg" ``` `com_hlora_weights_phi4.bin`的权重可以[在这里](https://huggingface.co/lintw/HealthGPT-L14)下载。 ### 图像重建同样，只需将`HLORA_PATH`设置为指向[`gen_hlora_weights.bin`](https://huggingface.co/lintw/HealthGPT-M3/blob/main/gen_hlora_weights.bin)文件并配置其他模型路径。然后，您可以使用以下脚本执行图像重建任务： ``` cd llava/demo bash gen_infer.sh ``` 您也可以直接执行以下 python 命令： ``` python3 gen_infer.py \ --model_name_or_path "microsoft/Phi-3-mini-4k-instruct" \ --dtype "FP16" \ --hlora_r "256" \ --hlora_alpha "512" \ --hlora_nums "4" \ --vq_idx_nums "8192" \ --instruct_template "phi3_instruct" \ --vit_path "openai/clip-vit-large-patch14-336/" \ --hlora_path "path/to/your/local/gen_hlora_weights.bin" \ --fusion_layer_path "path/to/your/local/fusion_layer_weights.bin" \ --question "重建图像。" \ --img_path "path/to/image.jpg" \ --save_path "path/to/save.jpg" ``` ## 服务器 **基于 Gradio 的交互式聊天 UI，支持文本+图像输入，并根据不同模式返回文本或图像。** ### 📌 项目介绍本项目是一个**Gradio**前端界面，支持用户： - **分析图像（理解任务）**：输入文本+图像，输出**文本** - **生成图像（生成任务）**：输入文本+图像，输出**图像** ### 📦 安装依赖本项目基于 Python 运行，需要安装`Gradio`和`Pillow`。 ```bash pip install gradio pillow ``` ### ▶️ 运行项目在终端中运行以下命令： ```bash python app.py ``` 运行后，终端将输出 Gradio 访问地址（如http://127.0.0.1:5010），可以在浏览器中打开使用。

Example Image

## 🔗 引用如果您觉得这项工作有用，请考虑给此存储库一个星，并按照以下方式引用我们的论文： ``` @misc{lin2025healthgptmedicallargevisionlanguage, title={HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation}, author={Tianwei Lin and Wenqiao Zhang and Sijing Li and Yuqian Yuan and Binhe Yu and Haoyuan Li and Wanggui He and Hao Jiang and Mengze Li and Xiaohui Song and Siliang Tang and Jun Xiao and Hui Lin and Yueting Zhuang and Beng Chin Ooi}, year={2025}, eprint={2502.09838}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2502.09838}, } ``` ## 🤝 致谢我们的项目基于以下存储库开发： - [LLaVA](https://github.com/haotian-liu/LLaVA): 大型语言和视觉助手 - [LLaVA++](https://github.com/mbzuai-oryx/LLaVA-pp): 使用 LLaMA-3 和 Phi-3 扩展视觉能力 - [Taming Transformers](https://github.com/CompVis/taming-transformers): 用于高分辨率图像合成的驯服 Transformers ## ⚖️ 许可证此存储库遵循[Apache License 2.0](LICENSE)。 ``` ```