# InternVL
**Repository Path**: cpgithub/InternVL
## Basic Information
- **Project Name**: InternVL
- **Description**: No description available
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: api
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-05-21
- **Last Updated**: 2025-05-21
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# InternVL Family: Closing the Gap to Commercial Multimodal Models with Open-Source Suites โโ A Pioneering Open-Source Alternative to GPT-4o
[\[๐ Blog\]](https://internvl.github.io/blog/) [\[๐ค FAQs\]](https://internvl.readthedocs.io/en/latest/tutorials/faqs.html) [\[๐จ๏ธ Chat Demo\]](https://internvl.opengvlab.com/) [\[๐ค HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[๐ Document\]](https://internvl.readthedocs.io/en/latest/) [\[๐ API\]](https://internlm.intern-ai.org.cn/api/document) [\[๐ Quick Start\]](#quick-start-with-huggingface)
[\[๐ฅ InternVL2.5 Report\]](https://huggingface.co/papers/2412.05271) [\[Mini-InternVL Paper\]](https://arxiv.org/abs/2410.16261) [\[InternVL2 Blog\]](https://internvl.github.io/blog/2024-07-02-InternVL-2.0/) [\[๐ InternVL 1.5 Paper\]](https://huggingface.co/papers/2404.16821) [\[๐ InternVL 1.0 Paper\]](https://huggingface.co/papers/2312.14238)
[\[๐ 2.0 ไธญๆ่งฃ่ฏป\]](https://zhuanlan.zhihu.com/p/706547971) [\[๐ 1.5 ไธญๆ่งฃ่ฏป\]](https://zhuanlan.zhihu.com/p/699439759) [\[๐ 1.0 ไธญๆ่งฃ่ฏป\]](https://zhuanlan.zhihu.com/p/702946079)
[Switch to the Chinese version (ๅๆข่ณไธญๆ็)](/README_zh.md)


## News ๐๐๐
- `2024/12/20`: ๐ฅ We release the [InternVL2.5-MPO](https://internvl.github.io/blog/2024-12-20-InternVL-2.5-MPO/), which is finetuned with [Mixed Preference Optimization](https://huggingface.co/papers/2411.10442) on [MMPR-v1.1](https://huggingface.co/datasets/OpenGVLab/MMPR-v1.1). **The resulting models outperform their counterparts without MPO by an average of 2 points across all model scales on the OpenCompass leaderboard.** These models are available at [HF link](https://huggingface.co/collections/OpenGVLab/internvl25-mpo-6753fed98cd828219b12f849).
- `2024/12/17`: ๐ [InternVL2/2.5](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/internvl2) is supported in [PaddleMIX](https://github.com/PaddlePaddle/PaddleMIX) by Paddle Team.
- `2024/12/05`: ๐ We release the [InternVL2.5](https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c), an advanced multimodal large language model (MLLM) series with parameter coverage ranging from 1B to 78B. [InternVL2_5-78B](https://huggingface.co/OpenGVLab/InternVL2_5-78B) is the first open-source MLLMs to achieve over **70%** on the **MMMU benchmark**, matching the performance of leading closed-source commercial models like GPT-4o. These models are available at [HF link](https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c).
- `2024/11/14`: We introduce [MMPR](https://huggingface.co/datasets/OpenGVLab/MMPR), a high-quality, large-scale multimodal reasoning preference dataset, and [MPO](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/internvl2.0_mpo), an effective preference optimization algorithm. The resulting model, [InternVL2-8B-MPO](https://huggingface.co/OpenGVLab/InternVL2-8B-MPO), achieves an accuracy of 67.0 on MathVista. Please refer to our [paper](https://arxiv.org/abs/2411.10442), [project page](https://internvl.github.io/blog/2024-11-14-InternVL-2.0-MPO/) and [document](https://internvl.readthedocs.io/en/latest/internvl2.0/preference_optimization.html) for more details.
- `2024/10/21`: We release the Mini-InternVL series. These models achieve impressive performance with minimal size: the 4B model achieves 90% of the performance with just 5% of the model size. For more details, please check our [project page](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/mini_internvl) and [document](https://internvl.readthedocs.io/en/latest/internvl2.0/domain_adaptation.html).
- `2024/08/01`: The [Chartmimic](https://chartmimic.github.io/) team evaluated the InternVL2 series models on their benchmark. The InternVL2-26B and 76B models achieved the top two performances among open-source models, with the InternVL2 76B model surpassing GeminiProVision and exhibiting comparable results to Claude-3-opus.
- `2024/08/01`: InternVL2-Pro achieved the SOTA performance among open-source models on the [CharXiv](https://charxiv.github.io/#leaderboard) dataset, surpassing many closed-source models such as GPT-4V, Gemini 1.5 Flash, and Claude 3 Sonnet.
- `2024/07/24`: The [MLVU](https://github.com/JUNJIE99/MLVU) team evaluated InternVL-1.5 on their benchmark. The average performance on the multiple-choice task was 50.4%, while the performance on the generative tasks was 4.02. The performance on the multiple-choice task ranked #1 among all open-source MLLMs.
- `2024/07/04`: We release the [InternVL2 series](https://huggingface.co/collections/OpenGVLab/internvl-20-667d3961ab5eb12c7ed1463e). InternVL2-Pro achieved a 62.0% accuracy on the MMMU benchmark, matching the performance of leading closed-source commercial models like GPT-4o.
| Model |
Date |
HF Link |
MS Link |
Note |
| Mini‑InternVL‑Chat‑4B‑V1‑5 |
2024.05.28 |
๐ค link |
๐ค link |
๐๐ 16% of the model size, 90% of the performance |
| Mini-InternVL-Chat-2B-V1-5 |
2024.05.19 |
๐ค link |
๐ค link |
๐ 8% of the model size, 80% of the performance |
| InternVL-Chat-V1-5 |
2024.04.18 |
๐ค link |
๐ค link |
support 4K image; super strong OCR; Approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc. |
| InternVL-Chat-V1-2-Plus |
2024.02.21 |
๐ค link |
๐ค link |
more SFT data and stronger |
| InternVL-Chat-V1-2 |
2024.02.11 |
๐ค link |
๐ค link |
scaling up LLM to 34B |
| InternVL-Chat-V1-1 |
2024.01.24 |
๐ค link |
๐ค link |
support Chinese and stronger OCR |
| InternVL-Chat-19B |
2023.12.25 |
๐ค link |
๐ค link |
English multimodal dialogue |
| InternVL-Chat-13B |
2023.12.25 |
๐ค link |
๐ค link |
English multimodal dialogue |
#### CLIP-like Model (InternVL 1.0-2.5)