# LLM-Web-Service **Repository Path**: shangruobing/LLM-Web-Service ## Basic Information - **Project Name**: LLM-Web-Service - **Description**: LLM-Web-Service deploy various open-source Large Language Models (LLMs) with Flask. - **Primary Language**: Python - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 1 - **Forks**: 0 - **Created**: 2023-12-13 - **Last Updated**: 2024-05-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: llm ## README # LLM Web Service LLM-Web-Service deploy various open-source Large Language Models (LLMs) with Flask. # Introduction We provide a simple browser interface and RESTful APIs for you to chat with LLMs. > This repository doesn't provide the download of LLMs. > You can download them from their official homepage or Huggingface. ## API For each model, we provide the RESTful APIs for calling. We use `Ping` API to test the connection of service, and use `Chat` API to chat with LLM. > For each model, we provide the following APIs to use. - `GET` http://127.0.0.1:5000/api/llm/ping - `POST` http://127.0.0.1:5000/api/llm/chat ## Browser Interface We provide a simple browser interface for you to chat with LLMs. You can access it by visiting http://127.0.0.1:5000. ## LangServe You can lunch the LangsServe in `serve.py` file. You can access it by visiting http://127.0.0.1:8000/llm/playground. # Quick Start ## Requirement LLMs always has many parameters, so you must have at least one GPU to run them. This table shows the GPU usage of each model on our experimental devices. | LLM | Device | GPU Usage | |:----------------------------------------------------------------------------:|:-------------------:|:---------:| | [Llama-2-7b-chat](https://huggingface.co/meta-llama/Llama-2-7b-chat) | NVIDIA RTX 4090 24G | 16G | | [ChatGLM3-7b](https://huggingface.co/THUDM/chatglm3-6b) | NVIDIA RTX 4090 24G | 12G | | [Baichuan2-13B-Chat](https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat) | NVIDIA A100 80G | 55G | | [Qwen-14B-Chat](https://huggingface.co/Qwen/Qwen-14B-Chat) | NVIDIA A100 80G | 55G | | [InternLM-chat-20b-4bit](https://huggingface.co/internlm/internlm-chat-20b) | NVIDIA A100 80G | 78G | ## Install ```shell python -m venv venv source venv/bin/activate pip install -r requirements.txt ``` ## Launch 1. Assume your model is named `Chatbot`. 2. Create a folder named `Chatbot_deplotment` in `weights` folder. 3. Place the model weight files in the `Chatbot_deplotment` folder. 4. Write the model loading code in `model.py`. 5. Configure the model name in `config.py`. 6. Execute the following command to launch the service. ```shell # Launch the service python main.py # or detach running nohup python main.py > log.txt 2>&1 & ``` ## Use with Shell ```shell curl http://127.0.0.1:5000/api/llm/ping curl -X POST \ -H "Content-Type: application/json" \ -d '{"question": "Hello!"}' \ http://127.0.0.1:5000/api/llm/chat ``` ## Use with Python ```python import json import requests def ping(): print("ping") url = "http://127.0.0.1:5000/api/llm/ping" response = requests.get(url) print(response.text) def chat(): print("chat") url = "http://127.0.0.1:5000/api/llm/chat" headers = {"Content-Type": "application/json"} data = {"question": "Hello!"} response = requests.post(url, headers=headers, data=json.dumps(data)) print(response.text) if __name__ == '__main__': ping() chat() ``` ## Coding Instruction You need to implement the `AbstractModel` class in `core/model.py` and write the model loading code in `model.py`. ```python from transformers import AutoTokenizer, AutoModel from core.model import AbstractModel MODEL_PATH = "Your-Weight-Path" class ChatModel(AbstractModel): def _load_model(self): tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True) model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True, device='cuda:0') model = model.eval() self.model = model self.tokenizer = tokenizer def chat_with_model(self, question, history): message, history = self.model.chat(self.tokenizer, question, history) return message, history ```