# internlm-xcomposer **Repository Path**: rtkris/internlm-xcomposer ## Basic Information - **Project Name**: internlm-xcomposer - **Description**: 书生·浦语灵笔(InternLM-XComposer,简称“浦语灵笔”)是基于书生·浦语大语言模型研发的视觉-语言大模型,提供出色的图文理解和创作能力,具有多项优势: 图文交错创作 - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: https://www.oschina.net/p/internlm-xcomposer - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 2 - **Created**: 2024-04-01 - **Last Updated**: 2024-11-24 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
浦语·灵笔2
 | InternLM-XComposer2-VL 🤗
  | 技术报告 📄
[English](./README.md) | [简体中文](./README_CN.md)
感谢社区提供的 InternLM-XComposer2 Hugging Face 在线试用 | OpenXLab 在线试用
[**ShareGPT4V**](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/ShareGPT4V): **Improving Large Multi-modal Models with Better Captions**
> [**DualFocus**](https://github.com/InternLM/InternLM-XComposer/tree/main/projects/DualFocus): **Integrating Macro and Micro Perspectives in Multi-modal Large Language Models**
**浦语·灵笔2**是基于[书生·浦语2](https://github.com/InternLM/InternLM/tree/main)大语言模型研发的突破性的图文多模态大模型,具有非凡的图文写作和图像理解能力,在多种应用场景表现出色:
- **自由指令输入的图文写作:** 浦语·灵笔2可以理解**自由形式的图文指令输入,包括大纲、文章细节要求、参考图片等**,为用户打造图文并貌的专属文章。生成的文章文采斐然,图文相得益彰,提供沉浸式的阅读体验。
- **准确的图文问题解答:** 浦语·灵笔2具有海量图文知识,可以准确的回复各种图文问答难题,在识别、感知、细节描述、视觉推理等能力上表现惊人。
- **杰出性能:** 浦语·灵笔2基于书生·浦语2-7B模型,我们在13项多模态评测中大幅领先同量级多模态模型,在其中6项评测中超过 GPT-4V 和 Gemini Pro。
(浦语·灵笔2-视觉问答-7B): 基于书生·浦语2-7B大语言模型训练,面向多模态评测和视觉问答。浦语·灵笔2-视觉问答-7B是目前最强的基于7B量级语言模型基座的图文多模态大模型,领跑多达13个多模态大模型榜单。
- **InternLM-XComposer2-7B** 🤗
: 进一步微调,支持自由指令输入图文写作的图文多模态大模型。
更多方法细节请参考[技术报告](https://arxiv.org/abs/2401.16420).
internlm-xcomposer2-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b/summary) | 2024-01-26 |
| **InternLM-XComposer2-VL** | Benchmark, 视觉问答 | [🤗internlm-xcomposer2-vl-7b](https://huggingface.co/internlm/internlm-xcomposer2-vl-7b) | [
internlm-xcomposer2-vl-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b/summary) | 2024-01-26 |
| **InternLM-XComposer2-4bit** | 图文创作 | [🤗internlm-xcomposer2-7b-4bit](https://huggingface.co/internlm/internlm-xcomposer2-7b-4bit) | [
internlm-xcomposer2-7b-4bit](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-7b-4bit/summary) | 2024-02-06 |
| **InternLM-XComposer2-VL-4bit** | Benchmark, 视觉问答 | [🤗internlm-xcomposer2-vl-7b-4bit](https://huggingface.co/internlm/internlm-xcomposer2-vl-7b-4bit) | [
internlm-xcomposer2-vl-7b-4bit](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-vl-7b-4bit/summary) | 2024-02-06 |
| **InternLM-XComposer** | 图文创作, 视觉问答 | [🤗internlm-xcomposer-7b](https://huggingface.co/internlm/internlm-xcomposer-7b) | [
internlm-xcomposer-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer-7b/summary) | 2023-09-26 |
| **InternLM-XComposer-4bit** | 图文创作, 视觉问答 | [🤗internlm-xcomposer-7b-4bit](https://huggingface.co/internlm/internlm-xcomposer-7b-4bit) | [
internlm-xcomposer-7b-4bit](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer-7b-4bit/summary) | 2023-09-26 |
| **InternLM-XComposer-VL** | Benchmark | [🤗internlm-xcomposer-vl-7b](https://huggingface.co/internlm/internlm-xcomposer-vl-7b) | [
internlm-xcomposer-vl-7b](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer-vl-7b/summary) | 2023-09-26 |
## 评测
我们在13个多模态评测对InternLM-XComposer2-VL上进行测试,包括:[MathVista](https://mathvista.github.io/), [MMMU](https://mmmu-benchmark.github.io/), [AI2D](https://prior.allenai.org/projects/diagram-understanding), [MME](https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation), [MMBench](https://opencompass.org.cn/leaderboard-multimodal), [MMBench-CN](https://opencompass.org.cn/leaderboard-multimodal), [SEED-Bench](https://huggingface.co/spaces/AILab-CVC/SEED-Bench_Leaderboard), [QBench](https://github.com/Q-Future/Q-Bench/tree/master/leaderboards#overall-leaderboards), [HallusionBench](https://github.com/tianyi-lab/HallusionBench), [ChartQA](https://github.com/vis-nlp/ChartQA), [MM-Vet](https://github.com/yuweihao/MM-Vet), [LLaVA-in-the-wild](https://github.com/haotian-liu/LLaVA), [POPE](https://github.com/AoiDragon/POPE).
复现评测结果,请参考[评测细节](./evaluation/README.md)。
### 对比闭源多模态API以及开源SOTA模型。
| | MathVista | AI2D | MMMU | MME | MMB | MMBCN | SEEDI | LLaVAW | QBenchT | MM-Vet | HallB | ChartVQA |
| -------------------------- | --------- | ------ | ----- | ------ | ------ | ------ | ----- | ------ | ------- | ------ | ------ | -------- |
| Open-source Previous SOTA | SPH-MOE | Monkey | Yi-VL | WeMM | L-Int2 | L-Int2 | SPH-2 | CogVLM | Int-XC | CogVLM | Monkey | CogAgent |
| | 8x7B | 10B | 34B | 6B | 20B | 20B | 17B | 17B | 8B | 30B | 10B | 18B |
| | 42.3 | 72.6 | 45.9 | 2066.6 | 75.1 | 73.7 | 74.8 | 73.9 | 64.4 | 56.8 | 58.4 | 68.4 |
| | | | | | | | | | | | | |
| GPT-4V | 49.9 | 78.2 | 56.8 | 1926.5 | 77 | 74.4 | 69.1 | 93.1 | 74.1 | 67.7 | 65.8 | 78.5 |
| Gemini-Pro | 45.2 | 73.9 | 47.9 | 1933.3 | 73.6 | 74.3 | 70.7 | 79.9 | 70.6 | 64.3 | 63.9 | 74.1 |
| QwenVL-Plus | 43.3 | 75.9 | 46.5 | 2183.3 | 67 | 70.7 | 72.7 | 73.7 | 68.9 | 55.7 | 56.4 | 78.1 |
| Ours | 57.6 | 78.7 | 42 | 2242.7 | 79.6 | 77.6 | 75.9 | 81.8 | 72.5 | 51.2 | 60.3 | 72.6 |
### 对比开源模型。
| Method | LLM | MathVista | MMMU | MMEP | MMEC | MMB | MMBCN | SEEDI | LLaVAW | QBenchT | MM-Vet | HallB | POPE |
| ------------ | ------------ | --------- | ---- | ------- | ----- | ---- | ----- | ----- | ------ | ------- | ------ | ----- | ---- |
| BLIP-2 | FLAN-T5 | - | 35.7 | 1,293.8 | 290.0 | - | - | 46.4 | 38.1 | - | 22.4 | - | - |
| InstructBLIP | Vicuna-7B | 25.3 | 30.6 | - | - | 36.0 | 23.7 | 53.4 | 60.9 | 55.9 | 26.2 | 53.6 | 78.9 |
| IDEFICS-80B | LLaMA-65B | 26.2 | 24.0 | - | - | 54.5 | 38.1 | 52.0 | 56.9 | - | 39.7 | 46.1 | - |
| Qwen-VL-Chat | Qwen-7B | 33.8 | 35.9 | 1,487.5 | 360.7 | 60.6 | 56.7 | 58.2 | 67.7 | 61.7 | 47.3 | 56.4 | - |
| LLaVA | Vicuna-7B | 23.7 | 32.3 | 807.0 | 247.9 | 34.1 | 14.1 | 25.5 | 63.0 | 54.7 | 26.7 | 44.1 | 80.2 |
| LLaVA-1.5 | Vicuna-13B | 26.1 | 36.4 | 1,531.3 | 295.4 | 67.7 | 63.6 | 68.2 | 70.7 | 61.4 | 35.4 | 46.7 | 85.9 |
| ShareGPT4V | Vicuna-7B | 25.8 | 36.6 | 1,567.4 | 376.4 | 68.8 | 62.2 | 69.7 | 72.6 | - | 37.6 | 49.8 | - |
| CogVLM-17B | Vicuna-7B | 34.7 | 37.3 | - | - | 65.8 | 55.9 | 68.8 | 73.9 | - | 54.5 | 55.1 | - |
| LLaVA-XTuner | InernLM2-20B | 24.6 | 39.4 | - | - | 75.1 | 73.7 | 70.2 | 63.7 | - | 37.2 | 47.7 | - |
| Monkey-10B | Qwen-7B | 34.8 | 40.7 | 1,522.4 | 401.4 | 72.4 | 67.5 | 68.9 | 33.5 | - | 33.0 | 58.4 | - |
| InternLM-XC | InernLM-7B | 29.5 | 35.6 | 1,528.4 | 391.1 | 74.4 | 72.4 | 66.1 | 53.8 | 64.4 | 35.2 | 57.0 | - |
| Ours | InernLM2-7B | 57.6 | 43.0 | 1,712.0 | 530.7 | 79.6 | 77.6 | 75.9 | 81.8 | 72.5 | 51.2 | 59.1 | 87.7 |
## 环境要求
- python 3.8 and above
- pytorch 1.12 and above, 2.0 and above are recommended
- CUDA 11.4 and above are recommended (this is for GPU users)