# visual-chatgpt **Repository Path**: dutf/visual-chatgpt ## Basic Information - **Project Name**: visual-chatgpt - **Description**: visual-chatgpt - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2023-03-25 - **Last Updated**: 2023-03-25 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Visual ChatGPT **Visual ChatGPT** connects ChatGPT and a series of Visual Foundation Models to enable **sending** and **receiving** images during chatting. See our paper: [Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models](https://arxiv.org/abs/2303.04671) ## Demo

## System Architecture

Logo

## Quick Start ``` # create a new environment conda create -n visgpt python=3.8 # activate the new environment conda activate visgpt # prepare the basic environments pip install -r requirement.txt # download the visual foundation models bash download.sh # prepare your private openAI private key export OPENAI_API_KEY={Your_Private_Openai_Key} # create a folder to save images mkdir ./image # Start Visual ChatGPT ! python visual_chatgpt.py ``` ## GPU memory usage Here we list the GPU memory usage of each visual foundation model, one can modify ``self.tools`` with fewer visual foundation models to save your GPU memory: | Foundation Model | Memory Usage (MB) | |------------------------|-------------------| | ImageEditing | 6667 | | ImageCaption | 1755 | | T2I | 6677 | | canny2image | 5540 | | line2image | 6679 | | hed2image | 6679 | | scribble2image | 6679 | | pose2image | 6681 | | BLIPVQA | 2709 | | seg2image | 5540 | | depth2image | 6677 | | normal2image | 3974 | | InstructPix2Pix | 2795 | ## Acknowledgement We appreciate the open source of the following projects: [Hugging Face](https://github.com/huggingface) [LangChain](https://github.com/hwchase17/langchain) [Stable Diffusion](https://github.com/CompVis/stable-diffusion) [ControlNet](https://github.com/lllyasviel/ControlNet) [InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix) [CLIPSeg](https://github.com/timojl/clipseg) [BLIP](https://github.com/salesforce/BLIP)