# ScienceQA **Repository Path**: wang-yang-y/ScienceQA ## Basic Information - **Project Name**: ScienceQA - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-07-28 - **Last Updated**: 2024-07-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # ScienceQA: Science Question Answering           Data and code for NeurIPS 2022 Paper "[Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering](http://lupantech.github.io/papers/neurips22_scienceqa.pdf)". For more details, please refer to the project page with dataset exploration and visualization tools: https://scienceqa.github.io. :bell: If you have any questions or suggestions, please don't hesitate to let us know. You can directly email [Pan Lu](https://lupantech.github.io/) at UCLA using the email address lupantech@gmail.com, comment on the [Twitter](https://twitter.com/lupantech/status/1570828580346802178), or post an issue on this repository. ## ๐ฅ News ๐ฅ - **[2023.12.29]** ๐จ We have a major update featuring over 100 recent models! We appreciate your contributions and feedback. ๐ - **[2023.05.04]** **ScienceQA** Featured in Leaked Google Document: "[We Have No Moat, And Neither Does OpenAI](https://www.semianalysis.com/p/google-we-have-no-moat-and-neither)": A recent leak of an internal Google document highlights the advancements and impact of **ScienceQA** within the AI research community. ๐ฏ - **[2023.05.03]** In April, our **ScienceQA** dataset was downloaded **1,421** times from [HuggingFace Datasets](https://huggingface.co/datasets/derek-thomas/ScienceQA), showcasing its growing popularity in the community. [[Link](https://raw.githubusercontent.com/lupantech/ScienceQA/main/assets/huggingface_2023.05.03.png)] ๐ - **[2023.04.19]** [Chameleon](https://chameleon-llm.github.io/): Developed by UCLA and Microsoft, this innovative project achieves a new SOTA in the few-shot setting, reaching an impressive **86.54%**. :star: - **[2023.04.17]** [LLaVA](https://llava-vl.github.io/): A collaborative effort by UWโMadison and Microsoft, this groundbreaking work sets a new SOTA at **92.53%**. :star: - **[2023.04.01]** Our work is accepted by [CVPR 2023 O-DRUM Workshop](https://asu-apg.github.io/odrum/). - **[2023.04.01]** Our work is covered by [Towards AI](https://towardsai.net/p/machine-learning/chain-of-thought-reasoning-by-c-jarnach-medium?amp=1). - **[2023.04.01]** Our **ScienceQA** dataset was downloaded **377** times in March at [HuggingFace Datasets](https://huggingface.co/datasets/derek-thomas/ScienceQA). - **[2023.03.30]** The **ScienceQA** dataset is now included at [OpenDataLab](https://opendatalab.org.cn/ScienceQA). - **[2023.03.28]** The **ScienceQA** dataset has served as the primary benchmark for [LLaMA-Adapter](https://github.com/ZrrSkywalker/LLaMA-Adapter), developed by Shanghai AI Laboratory, UCLA, and CUHK. :star: - **[2023.02.13]** Our work gives an oral presentation by Pan Lu at [AAAI 2023 KnowledgeNLP Workshop](https://knowledge-nlp.github.io/aaai2023/publications.html). - **[2023.02.05]** Our work is covered by [MarkTechPost](https://www.marktechpost.com/2023/02/05/a-new-artificial-intelligence-research-proposes-multimodal-chain-of-thought-reasoning-in-language-models-that-outperforms-gpt-3-5-by-16-75-17-%E2%86%92-91-68-on-scienceqa/). - **[2023.02.24]** The **ScienceQA** dataset is now included at [HuggingFace Datasets](https://huggingface.co/datasets/derek-thomas/ScienceQA). :star: - **[2023.02.02]** The **ScienceQA** dataset has served as the primary benchmark for the new generation of multimodal reasoning systems, [Multimodal-CoT](https://github.com/amazon-science/mm-cot), developed by [Amazon Science](https://www.amazon.science/). - **[2022.11.29]** Our work gives an poster presentation by Pan Lu at [NeurIPS 2022](https://nips.cc/Conferences/2022). - **[2022.11.20]** Our work is covered by [Geek Culture | Medium](https://medium.com/geekculture/neurips-2022-the-first-multi-modal-science-question-answering-science-qa-dataset-with-detailed-45b8d628b301). - **[2022.11]** Our work is now included at [Paper with Code](https://paperswithcode.com/dataset/scienceqa). - **[2022.09.22]** Our work is accepted to [NeurIPS 2022](https://nips.cc/Conferences/2022). ๐ - **[2022.09.20]** Our work is featured in [Deep AI](https://deepai.org/publication/learn-to-explain-multimodal-reasoning-via-thought-chains-for-science-question-answering). ## ๐ Star History [](https://star-history.com/#lupantech/ScienceQA&Date) ## :fire: Leaderboard :fire: Evaluation of different methods on the **test** split (whole: 4,241, mini: 1,000 examples). The accuracies across various categories and the overall average are reported below. ๐ You are invited to contribute your results to the TabMWP test split! Please send your result scores to [this email](mailto:lupantech@gmail.com) or open a new issue at the [github repository](https://github.com/lupantech/PromptPG/issues). โ ๏ธโ ๏ธโ ๏ธ Caveat: The data in the leaderboard is collected manually from existing papers. There might be some errors in the data, ambiguous data due to different interpretations, and missing data due to the lack of information in the papers. Make sure to double-check the data before using it. Please contact us at [this email](mailto:lupantech@gmail.com) if you find any errors or have any suggestions. We appreciate your contributions and feedback. The interactive leaderboard is available at https://scienceqa.github.io/leaderboard.html. | **#** | **Model** | **Method** | **Learning** | **#Size** | **#P** | **Link** | **Date** | **NAT** | **SOC** | **LAN** | **TXT** | **IMG** | **NO** | **G1-6** | **G7-12** | **Avg** | | ----- | ----------------------------- | :--------: | :----------: | :-------: | :----: | :----------------------------------------------------------: | :------: | :-----: | :-----: | :-----: | :-----: | :-----: | :----: | :------: | :-------: | :-------: | | * | **Human Performance** | - | - | - | - | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 90.23 | 84.97 | 87.48 | 89.60 | 87.50 | 88.10 | 91.59 | 82.42 | **88.40** | | * | **Random Chance** | - | - | - | - | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 40.28 | 46.13 | 29.25 | 47.45 | 40.08 | 33.66 | 39.35 | 40.67 | **39.83** | | 1 | **Mutimodal-T-SciQ_Large ๐ฅ** | LLM | Fine-tune | 738M | 738M | [Link](https://arxiv.org/abs/2305.03453) | 23-05-05 | 96.89 | 95.16 | 95.55 | 96.53 | 94.70 | 96.79 | 96.44 | 95.72 | **96.18** | | 2 | **MC-CoT_F-Large ๐ฅ** | VLM | Fine-tune | 783M | - | [Link](https://arxiv.org/abs/2311.14109) | 23-11-23 | 97.47 | 90.44 | 93.18 | 96.97 | 93.75 | 94.49 | 95.30 | 94.13 | **94.88** | | 3 | **Honeybee (Vicuna-13B) ๐ฅ** | VLM | Fine-tune | 13B | - | [Link](https://arxiv.org/abs/2312.06742) | 23-12-11 | 95.20 | 96.29 | 91.18 | 94.48 | 93.75 | 93.17 | 95.04 | 93.21 | **94.39** | | 4 | **Enigma-COT_Large** | LLM | Fine-tune | 793M | 793M | [Link](https://arxiv.org/abs/2307.12626) | 23-07-24 | 97.51 | 84.70 | 94.73 | 96.68 | 91.37 | 95.89 | 94.46 | 93.47 | **94.11** | | 5 | **MC-CoT_Large** | VLM | Fine-tune | 738M | - | [Link](https://arxiv.org/abs/2311.14109) | 23-11-23 | 95.47 | 89.99 | 91.82 | 95.11 | 92.66 | 93.24 | 94.27 | 91.76 | **93.37** | | 6 | **DPMM-CoT_Large** | VLM | Fine-tune | 738M | 738M | [Link](https://arxiv.org/abs/2312.08762) | 23-12-14 | 95.52 | 90.33 | 91.36 | 95.50 | 93.26 | 92.68 | 93.28 | 93.47 | **93.35** | | 7 | **LLaVA (GPT-4 judge)** | VLM | Fine-tune | 13B | 13B | [Link](https://arxiv.org/abs/2304.08485) | 23-04-17 | 91.56 | 96.74 | 91.09 | 90.62 | 88.99 | 93.52 | 92.73 | 92.16 | **92.53** | | 8 | **CoMD (Vicuna-7B)** | VLM | Fine-tune | 7B | - | [Link](https://arxiv.org/abs/2311.08213) | 23-11-14 | 91.83 | 95.95 | 88.91 | 90.91 | 89.94 | 91.08 | 92.47 | 90.97 | **91.94** | | 9 | **Mutimodal-T-SciQ_Base** | LLM | Fine-tune | 223M | 223M | [Link](https://arxiv.org/abs/2305.03453) | 23-05-05 | 91.52 | 91.45 | 92.45 | 91.94 | 90.33 | 92.26 | 92.11 | 91.10 | **91.75** | | 10 | **Multimodal-CoT_Large** | VLM | Fine-tune | 738M | 738M | [Link](https://arxiv.org/abs/2302.00923) | 23-02-02 | 95.91 | 82.00 | 90.82 | 95.26 | 88.80 | 92.89 | 92.44 | 90.31 | **91.68** | | 11 | **PILL (LLaMA-7B)** | VLM | Fine-tune | 7B | 45M | [Link](https://arxiv.org/abs/2311.02126) | 23-11-03 | 90.36 | 95.84 | 89.27 | 89.39 | 88.65 | 91.71 | 92.11 | 89.65 | **91.23** | | 12 | **LLaVA (ViT-L/16-224)** | VLM | Fine-tune | 13B | - | [Link](https://arxiv.org/abs/2312.01987) | 23-12-04 | - | - | - | - | - | - | - | - | **91.2** | | 13 | **DPMM-CoT_Base** | VLM | Fine-tune | 223M | 223M | [Link](https://arxiv.org/abs/2312.08762) | 23-12-14 | 92.72 | 87.85 | 89.91 | 92.72 | 90.48 | 91.29 | 91.45 | 90.11 | **90.97** | | 14 | **LLaVA** | VLM | Fine-tune | 13B | 13B | [Link](https://arxiv.org/abs/2304.08485) | 23-04-17 | 90.36 | 95.95 | 88.00 | 89.49 | 88.00 | 90.66 | 90.93 | 90.90 | **90.92** | | 15 | **LaVIN-13B** | VLM | Fine-tune | 13B | 5.4M | [Link](https://arxiv.org/abs/2305.15023) | 23-05-24 | 89.88 | 94.49 | 89.82 | 88.95 | 87.61 | 91.85 | 91.45 | 89.72 | **90.83** | | 16 | **MC-CoT_F-Base** | VLM | Fine-tune | 248M | - | [Link](https://arxiv.org/abs/2311.14109) | 23-11-23 | 93.56 | 83.58 | 90.73 | 94.13 | 89.24 | 90.94 | 90.93 | 90.38 | **90.73** | | 17 | **MC-CoT_Base** | VLM | Fine-tune | 223M | - | [Link](https://arxiv.org/abs/2311.14109) | 23-11-23 | 91.87 | 84.59 | 93.00 | 92.28 | 88.30 | 92.75 | 90.64 | 90.64 | **90.64** | | 18 | **LLaMA-SciTune** | VLM | Zero-shot | 13B | - | [Link](https://arxiv.org/abs/2307.01139) | 23-07-03 | 89.30 | 95.61 | 87.00 | 93.08 | 86.67 | 91.75 | 84.37 | 91.30 | **90.03** | | 19 | **LaVIN-7B** | VLM | Fine-tune | 7B | 3.8M | [Link](https://arxiv.org/abs/2305.15023) | 23-05-24 | 89.25 | 94.94 | 85.24 | 88.51 | 87.46 | 88.08 | 90.16 | 88.07 | **89.41** | | 20 | **Flan-T5-XL (LoRA)** | LLM | Fine-tune | - | - | [Link](https://leejayyoon.github.io/paper/InstructBLIP_PEFT.pdf) | 23-11-03 | - | - | - | - | - | - | - | - | **89.29** | | 21 | **Chat-UniVi** | VLM | Fine-tune | 7B | - | [Link](https://arxiv.org/abs/2311.08046) | 23-11-14 | 88.50 | 93.03 | 85.91 | 88.51 | 85.97 | 88.15 | 88.88 | 88.60 | **88.78** | | 22 | **DDCoT (T5)** | PLM | Fine-tune | 223M | 223M | [Link](https://arxiv.org/abs/2310.16436) | 23-10-25 | 88.72 | 86.84 | 84.91 | 87.59 | 83.34 | 88.08 | 88.58 | 85.10 | **87.34** | | 23 | **LG-VQA (CLIP)** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.20159) | 23-10-31 | - | - | - | - | - | - | - | - | **87.22** | | 24 | **Chameleon (GPT-4)** | Tool-LLM | Few-shot | 1T+ | - | [Link](https://arxiv.org/abs/2304.09842) | 23-04-19 | 89.83 | 74.13 | 89.82 | 88.27 | 77.64 | 92.13 | 88.03 | 83.72 | **86.54** | | 25 | **LG-VQA (BLIP-2)** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.20159) | 23-10-31 | - | - | - | - | - | - | - | - | **86.32** | | 26 | **LLaMA-SciTune** | VLM | Zero-shot | 7B | - | [Link](https://arxiv.org/abs/2307.01139) | 23-07-03 | 84.50 | 94.15 | 82.91 | 88.35 | 83.64 | 88.74 | 85.05 | 85.60 | **86.11** | | 27 | **Enigma-COT_Base** | LLM | Fine-tune | 229M | 229M | [Link](https://arxiv.org/abs/2307.12626) | 23-07-24 | 88.28 | 78.74 | 85.64 | 88.51 | 84.28 | 86.90 | 85.43 | 85.89 | **85.59** | | 28 | **LLaMA-Adapter** | VLM | Fine-tune | 6B | 1.2M | [Link](https://arxiv.org/abs/2303.16199) | 23-03-28 | 84.37 | 88.30 | 84.36 | 83.72 | 80.32 | 86.90 | 85.83 | 84.05 | **85.19** | | 29 | **Multimodal-CoT_Base** | VLM | Fine-tune | 223M | 223M | [Link](https://arxiv.org/abs/2302.00923) | 23-02-02 | 87.52 | 77.17 | 85.82 | 87.88 | 82.90 | 86.83 | 84.65 | 85.37 | **84.91** | | 30 | **IMMO SL+RL** | VLM | Fine-tune | 7B | 5M | [Link](https://arxiv.org/abs/2308.09970) | 23-08-19 | - | - | - | - | - | - | - | - | **84.8** | | 31 | **CoT GPT-4** | LLM | Few-shot | 1T+ | - | [Link](https://arxiv.org/abs/2304.09842) | 23-04-19 | 85.48 | 72.44 | 90.27 | 82.65 | 71.49 | 92.89 | 86.66 | 79.04 | **83.99** | | 32 | **HoT-T5_Large** | LLM | Fine-tune | 738M | 738M | [Link](https://arxiv.org/abs/2308.06207) | 23-08-11 | 84.46 | 79.08 | 84.64 | 82.89 | 75.81 | 88.15 | 83.88 | 82.47 | **83.38** | | 33 | **HoT-T5_Base** | LLM | Fine-tune | 223M | 223M | [Link](https://arxiv.org/abs/2308.06207) | 23-08-11 | 82.46 | 78.07 | 82.00 | 81.18 | 75.20 | 85.09 | 81.86 | 80.62 | **81.42** | | 34 | **DDCoT (ChatGPT)** | LLM | Zero-shot | 175B | - | [Link](https://arxiv.org/abs/2310.16436) | 23-10-25 | 80.15 | 76.72 | 82.82 | 78.89 | 72.53 | 85.02 | 82.86 | 75.21 | **80.15** | | 35 | **Chameleon (ChatGPT)** | Tool-LLM | Few-shot | 175B+ | - | [Link](https://arxiv.org/abs/2304.09842) | 23-04-19 | 81.62 | 70.64 | 84.00 | 79.77 | 70.80 | 86.62 | 81.86 | 76.53 | **79.93** | | 36 | **CoT GPT-3 + Doc** | Tool-LLM | Zero-shot | 173B | - | [Link](https://arxiv.org/abs/2308.00675) | 23-08-01 | - | - | - | - | - | - | - | - | **79.91** | | 37 | **UnifiedQA-T-SciQ_Base** | LLM | Fine-tune | 223M | 223M | [Link](https://arxiv.org/abs/2305.03453) | 23-05-05 | 76.56 | 88.99 | 80.45 | 72.90 | 73.84 | 83.47 | 81.09 | 75.19 | **79.41** | | 38 | **CoT ChatGPT** | LLM | Few-shot | 175B | - | [Link](https://arxiv.org/abs/2304.09842) | 23-04-19 | 78.82 | 70.98 | 83.18 | 77.37 | 67.92 | 86.13 | 80.72 | 74.03 | **78.31** | | 39 | **DDCoT (GPT-3)** | LLM | Zero-shot | 175B | - | [Link](https://arxiv.org/abs/2310.16436) | 23-10-25 | 78.60 | 73.90 | 80.45 | 77.27 | 69.96 | 82.93 | 80.65 | 73.50 | **78.09** | | 40 | **LaVIN-13B** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **77.54** | | 41 | **CoT GPT-3 (ALE)** | LLM | Few-shot | 173B | - | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 75.44 | 70.87 | 78.09 | 74.68 | 67.43 | 79.93 | 78.23 | 69.68 | **75.17** | | 42 | **LaVIN-7B** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **75.11** | | 43 | **CoT GPT-3 (AE)** | LLM | Few-shot | 173B | - | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 76.60 | 65.92 | 77.55 | 75.51 | 66.09 | 79.58 | 78.49 | 67.63 | **74.61** | | 44 | **BLIP-2** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **74.17** | | 45 | **CoT UnifiedQA** | PLM | Fine-tune | 223M | 223M | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 71.00 | 76.04 | 78.91 | 66.42 | 66.53 | 81.81 | 77.06 | 68.82 | **74.11** | | 46 | **GPT-3 (0-shot)** | LLM | Zero-shot | 173B | - | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 75.04 | 66.59 | 78.00 | 74.24 | 65.74 | 79.58 | 76.36 | 69.87 | **74.04** | | 47 | **GPT-3 (2-shot)** | LLM | Few-shot | 173B | - | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 74.64 | 69.74 | 76.00 | 74.44 | 67.28 | 77.42 | 76.80 | 68.89 | **73.97** | | 48 | **InstructBLIP** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **73.33** | | 49 | **UnifiedQA** | PLM | Fine-tune | 223M | 223M | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 68.16 | 69.18 | 74.91 | 63.78 | 61.38 | 77.84 | 72.98 | 65.00 | **70.12** | | 50 | **ChatGPT** | LLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **69.41** | | 51 | **MetaCLIP** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2312.06709) | 23-12-10 | - | - | - | - | - | - | - | - | **68.77** | | 52 | **OpenCLIP** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2312.06709) | 23-12-10 | - | - | - | - | - | - | - | - | **67.53** | | 53 | **Flan-T5-XXL** | LLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **67.43** | | 54 | **SAM** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2312.06709) | 23-12-10 | - | - | - | - | - | - | - | - | **67.08** | | 55 | **DINOv2** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2312.06709) | 23-12-10 | - | - | - | - | - | - | - | - | **64.60** | | 56 | **VisualBERT** | VLM | Fine-tune | 111M | 111M | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 59.33 | 69.18 | 61.18 | 62.71 | 62.17 | 58.54 | 62.96 | 59.92 | **61.87** | | 57 | **Patch-TRM** | VLM | Fine-tune | 90M | 90M | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 65.19 | 46.79 | 65.55 | 66.96 | 55.28 | 64.95 | 58.04 | 67.50 | **61.42** | | 58 | **ViLT** | VLM | Fine-tune | 113M | 113M | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 60.48 | 63.89 | 60.27 | 63.20 | 61.38 | 57.00 | 60.72 | 61.90 | **61.14** | | 59 | **DFAF** | VQA-NN | Fine-tune | 74M | 74M | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 64.03 | 48.82 | 63.55 | 65.88 | 54.49 | 64.11 | 57.12 | 67.17 | **60.72** | | 60 | **Chat-UniVi** | VLM | Zero-shot | 7B | - | [Link](https://arxiv.org/abs/2311.08046) | 23-11-14 | 58.61 | 61.08 | 61.82 | 57.33 | 58.25 | 61.39 | 62.04 | 56.23 | **59.96** | | 61 | **BAN** | VQA-NN | Fine-tune | 112M | 112M | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 60.88 | 46.57 | 66.64 | 62.61 | 52.60 | 65.51 | 56.83 | 63.94 | **59.37** | | 62 | **Top-Down** | VQA-NN | Fine-tune | 70M | 70M | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 59.50 | 54.33 | 61.82 | 62.90 | 54.88 | 59.79 | 57.27 | 62.16 | **59.02** | | 63 | **MiniGPT4** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **58.70** | | 64 | **LLaMA2-13B** | LLM | Zero-shot | 13B | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **55.78** | | 65 | **DDCoT (MiniGPT-4)** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.16436) | 23-10-25 | 57.37 | 62.32 | 46.82 | 65.91 | 56.72 | 48.57 | - | - | **55.67** | | 66 | **QVix** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2312.01598) | 23-12-04 | - | - | - | - | - | - | - | - | **55.0** | | 67 | **MCAN** | VQA-NN | Fine-tune | 95M | 95M | [Link](https://arxiv.org/abs/2209.09513) | 22-09-20 | 56.08 | 46.23 | 58.09 | 59.43 | 51.17 | 55.40 | 51.65 | 59.72 | **54.54** | | 68 | **LLaMA-Adapter-V2** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **54.44** | | 69 | **VLIS** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09767) | 23-10-15 | - | - | - | 53.1 | 49.3 | 49.1 | - | - | **50.2** | | 70 | **LLaVA-13B** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **47.74** | | 71 | **VPGTrans** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **47.00** | | 72 | **MiniGPT-4** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.16436) | 23-10-25 | 43.83 | 48.59 | 43.36 | 55.01 | 42.84 | 41.67 | - | - | **44.71** | | 73 | **LLaMA1-13B** | LLM | Zero-shot | 13B | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **43.33** | | 74 | **LLaMA2-7B** | LLM | Zero-shot | 7B | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **43.08** | | 75 | **LLaVA-7B** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **41.10** | | 76 | **OpenFlamingo** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **39.27** | | 77 | **Lynx** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **38.28** | | 78 | **mPLUG-Owl** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **37.93** | | 79 | **MultiGPT** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **36.29** | | 80 | **LLaMA1-7B** | LLM | Zero-shot | 7B | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **36.19** | | 81 | **Fromage** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2310.09036) | 23-10-13 | - | - | - | - | - | - | - | - | **34.51** | | - | **MMICL** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2309.07915) | 23-09-14 | - | - | - | - | 82.10 | - | - | - | **-** | | - | **VILA-13B (Llama-2-13B)** | VLM | - | 13B | - | [Link](https://arxiv.org/abs/2312.07533) | 23-12-12 | - | - | - | - | 73.7 | - | - | - | **-** | | - | **LLaVA+SIRI** | VLM | - | 13B | - | [Link](https://arxiv.org/abs/2311.17331) | 23-11-29 | - | - | - | - | 72.0 | - | - | - | **-** | | - | **LLaVA-1.5 (Vicuna-13B)** | VLM | Zero-shot | 13B | - | [Link](https://arxiv.org/abs/2310.03744) | 23-10-05 | - | - | - | - | 71.6 | - | - | - | **-** | | - | **InstructBLIP** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2309.07915) | 23-09-14 | - | - | - | - | 71.30 | - | - | - | **-** | | - | **ShareGPT4V (Vicuna-7B)** | VLM | - | 7B | - | [Link](https://arxiv.org/abs/2311.12793) | 23-11-23 | - | - | - | - | 68.4 | - | - | - | **-** | | - | **Qwen-VL-Chat (Qwen-7B)** | VLM | Zero-shot | 7B | - | [Link](https://arxiv.org/abs/2310.03744) | 23-10-05 | - | - | - | - | 68.2 | - | - | - | **-** | | - | **Multimodal Bard** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2308.03729) | 23-08-07 | - | - | - | - | 68 | - | - | - | **-** | | - | **Qwen-VL (Qwen-7B)** | VLM | Zero-shot | 7B | - | [Link](https://arxiv.org/abs/2310.03744) | 23-10-05 | - | - | - | - | 67.1 | - | - | - | **-** | | - | **LLaVA-1.5 (Vicuna-7B)** | VLM | Zero-shot | 7B | - | [Link](https://arxiv.org/abs/2310.03744) | 23-10-05 | - | - | - | - | 66.8 | - | - | - | **-** | | - | **Video-LLaVA (Vicuna-7B)** | VLM | - | 7B | - | [Link](https://arxiv.org/abs/2311.10122) | 23-11-16 | - | - | - | - | 66.4 | - | - | - | **-** | | - | **InstructBLIP+MoCLE** | VLM | Zero-shot | 7B | - | [Link](https://arxiv.org/abs/2312.12379) | 23-12-19 | - | - | - | - | 63.9 | - | - | - | **-** | | - | **Otter** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2309.07915) | 23-09-14 | - | - | - | - | 63.10 | - | - | - | **-** | | - | **InstructBLIP (Vicuna-13B)** | VLM | Zero-shot | 13B | - | [Link](https://arxiv.org/abs/2310.03744) | 23-10-05 | - | - | - | - | 63.1 | - | - | - | **-** | | - | **BLIP-2 (Vicuna-13B)** | VLM | Zero-shot | 13B | - | [Link](https://arxiv.org/abs/2310.03744) | 23-10-05 | - | - | - | - | 61 | - | - | - | **-** | | - | **InstructBLIP (Vicuna-7B)** | VLM | Zero-shot | 7B | - | [Link](https://arxiv.org/abs/2310.03744) | 23-10-05 | - | - | - | - | 60.5 | - | - | - | **-** | | - | **Ying-VLM** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2309.07915) | 23-09-14 | - | - | - | - | 55.70 | - | - | - | **-** | | - | **Shikra** | VLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2309.07915) | 23-09-14 | - | - | - | - | 45.80 | - | - | - | **-** | | | **CaCo-CoT (Claude)** | LLM | Few-shot | - | - | [Link](https://arxiv.org/abs/2308.11914) | 23-08-23 | - | - | - | 90.8 | - | - | - | - | **-** | | | **CaCo-CoT (Claude)** | LLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2308.11914) | 23-08-23 | - | - | - | 89.9 | - | - | - | - | **-** | | | **CoT Claude** | LLM | Few-shot | - | - | [Link](https://arxiv.org/abs/2308.11914) | 23-08-23 | - | - | - | 89.5 | - | - | - | - | **-** | | | **CaCo-CoT (ChatGPT)** | LLM | Few-shot | - | - | [Link](https://arxiv.org/abs/2308.11914) | 23-08-23 | - | - | - | 88.6 | - | - | - | - | **-** | | | **Claude** | LLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2308.11914) | 23-08-23 | - | - | - | 86.8 | - | - | - | - | **-** | | | **CaCo-CoT (ChatGPT)** | LLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2308.11914) | 23-08-23 | - | - | - | 86.5 | - | - | - | - | **-** | | | **CoT Claude** | LLM | Zero-shot | - | - | [Link](https://arxiv.org/abs/2308.11914) | 23-08-23 | - | - | - | 86.5 | - | - | - | - | **-** | Some notations in the table - **Model names:** - **Q**: question - **C**: context - **M**: multiple options - **A**: answer - **AE**: answer with explanation - **ALE**: answer with lecture and explanation - **Method types:** - **VQA-NN**: Standard neural network for visual question answering - **PLM**: Pre-trained language model - **LLM**: Large language model - **VLM**: (Large) vision-language model / Large multimodal model / Multimodal large language model - **Tool-LLM**: Tool-augmented large language model - **Learning:** - **Zero-shot**: The model is evaluated in a zero-shot setting on ScienceQA - **Few-shot**: The model is evaluated in a few-shot setting on ScienceQA - **Fine-tune**: The model is fine-tuned on ScienceQA - **-**: Not available - **#Size: Total number of parameters in the model** - **#P: Number of trainable parameters when fine-tuned on ScienceQA** - **Accuracies for different question sets:** - **NAT**: questions of the natural science subject - **SOC**: questions of the social science subject - **LAN**: questions of the language science subject - **TXT**: questions with the text context - **IMG**: questions with the image context - **NO**: questions with no context - **G1-6**: questions in the grade group of 1-6 - **G7-12**: questions in the grade group of 7-12 - **Avg**: all questions (reporting the average accuracy) ## :world_map: About ScienceQA We present **Science Question Answering (ScienceQA)**, a new benchmark that consists of 21,208 multimodal multiple choice questions with a diverse set of *science* topics and annotations of their answers with corresponding *lectures* and *explanations*. The lecture and explanation provide general external knowledge and specific reasons, respectively, for arriving at the correct answer.