1 Star 0 Fork 0

究极进化兽/liteLLM-proxy

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching

⚠️ DEPRECATION WARNING: LiteLLM is our new home. You can find the LiteLLM Proxy there. Thank you for checking us out! ❤️

Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models

PyPI Version PyPI Version Downloads litellm

Deploy on Railway

4BC6491E-86D0-4833-B061-9F54524B2579

Usage

Step 1: Put your API keys in .env Copy the .env.template and put in the relevant keys (e.g. OPENAI_API_KEY="sk-..")

Step 2: Test your proxy Start your proxy server

$ cd litellm-proxy && python3 main.py 

Make your first call

import openai 

openai.api_key = "sk-litellm-master-key"
openai.api_base = "http://0.0.0.0:8080"

response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey"}])

print(response)

What does liteLLM proxy do

  • Make /chat/completions requests for 50+ LLM models Azure, OpenAI, Replicate, Anthropic, Hugging Face

    Example: for model use claude-2, gpt-3.5, gpt-4, command-nightly, stabilityai/stablecode-completion-alpha-3b-4k

    {
      "model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
      "messages": [
        {
          "content": "Hello, whats the weather in San Francisco??",
          "role": "user"
        }
      ]
    }
    
  • Consistent Input/Output Format

    • Call all models using the OpenAI format - completion(model, messages)
    • Text responses will always be available at ['choices'][0]['message']['content']
  • Error Handling Using Model Fallbacks (if GPT-4 fails, try llama2)

  • Logging - Log Requests, Responses and Errors to Supabase, Posthog, Mixpanel, Sentry, LLMonitor, Traceloop, Helicone (Any of the supported providers here: https://docs.litellm.ai/docs/

    Example: Logs sent to Supabase Screenshot 2023-08-11 at 4 02 46 PM

  • Token Usage & Spend - Track Input + Completion tokens used + Spend/model

  • Caching - Implementation of Semantic Caching

  • Streaming & Async Support - Return generators to stream text responses

API Endpoints

/chat/completions (POST)

This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc

Input

This API endpoint accepts all inputs in raw JSON and expects the following inputs

  • model (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://docs.litellm.ai/docs/): eg gpt-3.5-turbo, gpt-4, claude-2, command-nightly, stabilityai/stablecode-completion-alpha-3b-4k
  • messages (array, required): A list of messages representing the conversation context. Each message should have a role (system, user, assistant, or function), content (message text), and name (for function role).
  • Additional Optional parameters: temperature, functions, function_call, top_p, n, stream. See the full list of supported inputs here: https://docs.litellm.ai/docs/

Example JSON body

For claude-2

{
  "model": "claude-2",
  "messages": [
    {
      "content": "Hello, whats the weather in San Francisco??",
      "role": "user"
    }
  ]
}

Making an API request to the Proxy Server

import requests
import json

# TODO: use your URL
url = "http://localhost:5000/chat/completions"

payload = json.dumps({
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "content": "Hello, whats the weather in San Francisco??",
      "role": "user"
    }
  ]
})
headers = {
  'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)

Output [Response Format]

Responses from the server are given in the following format. All responses from the server are returned in the following format (for all LLM models). More info on output here: https://docs.litellm.ai/docs/

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
        "role": "assistant"
      }
    }
  ],
  "created": 1691790381,
  "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
  "model": "gpt-3.5-turbo-0613",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 41,
    "prompt_tokens": 16,
    "total_tokens": 57
  }
}

Installation & Usage

Running Locally

  1. Clone liteLLM repository to your local machine:
    git clone https://github.com/BerriAI/liteLLM-proxy
    
  2. Install the required dependencies using pip
    pip install requirements.txt
    
  3. (optional)Set your LiteLLM proxy master key
    os.environ['LITELLM_PROXY_MASTER_KEY]` = "YOUR_LITELLM_PROXY_MASTER_KEY"
    or
    set LITELLM_PROXY_MASTER_KEY in your .env file
    
  4. Set your LLM API keys
    os.environ['OPENAI_API_KEY]` = "YOUR_API_KEY"
    or
    set OPENAI_API_KEY in your .env file
    
  5. Run the server:
    python main.py
    

Deploying

  1. Quick Start: Deploy on Railway

    Deploy on Railway

  2. GCP, AWS, Azure This project includes a Dockerfile allowing you to build and deploy a Docker Project on your providers

Support / Talk with founders

Roadmap

  • Support hosted db (e.g. Supabase)
  • Easily send data to places like posthog and sentry.
  • Add a hot-cache for project spend logs - enables fast checks for user + project limitings
  • Implement user-based rate-limiting
  • Spending controls per project - expose key creation endpoint
  • Need to store a keys db -> mapping created keys to their alias (i.e. project name)
  • Easily add new models as backups / as the entry-point (add this to the available model list)
The MIT License Copyright (c) Yujong Lee Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

暂无描述 展开 收起
README
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

不能加载更多了
马建仓 AI 助手
尝试更多
代码解读
代码找茬
代码优化
1
https://gitee.com/mason_yang/liteLLM-proxy.git
git@gitee.com:mason_yang/liteLLM-proxy.git
mason_yang
liteLLM-proxy
liteLLM-proxy
main

搜索帮助