⚠️ DEPRECATION WARNING: LiteLLM is our new home. You can find the LiteLLM Proxy there. Thank you for checking us out! ❤️
Step 1: Put your API keys in .env Copy the .env.template and put in the relevant keys (e.g. OPENAI_API_KEY="sk-..")
Step 2: Test your proxy Start your proxy server
$ cd litellm-proxy && python3 main.py
Make your first call
import openai
openai.api_key = "sk-litellm-master-key"
openai.api_base = "http://0.0.0.0:8080"
response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hey"}])
print(response)
Make /chat/completions
requests for 50+ LLM models Azure, OpenAI, Replicate, Anthropic, Hugging Face
Example: for model
use claude-2
, gpt-3.5
, gpt-4
, command-nightly
, stabilityai/stablecode-completion-alpha-3b-4k
{
"model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
"messages": [
{
"content": "Hello, whats the weather in San Francisco??",
"role": "user"
}
]
}
Consistent Input/Output Format
completion(model, messages)
['choices'][0]['message']['content']
Error Handling Using Model Fallbacks (if GPT-4
fails, try llama2
)
Logging - Log Requests, Responses and Errors to Supabase
, Posthog
, Mixpanel
, Sentry
, LLMonitor
, Traceloop
, Helicone
(Any of the supported providers here: https://docs.litellm.ai/docs/
Example: Logs sent to Supabase
Token Usage & Spend - Track Input + Completion tokens used + Spend/model
Caching - Implementation of Semantic Caching
Streaming & Async Support - Return generators to stream text responses
/chat/completions
(POST)This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc
This API endpoint accepts all inputs in raw JSON and expects the following inputs
model
(string, required): ID of the model to use for chat completions. See all supported models [here]: (https://docs.litellm.ai/docs/):
eg gpt-3.5-turbo
, gpt-4
, claude-2
, command-nightly
, stabilityai/stablecode-completion-alpha-3b-4k
messages
(array, required): A list of messages representing the conversation context. Each message should have a role
(system, user, assistant, or function), content
(message text), and name
(for function role).temperature
, functions
, function_call
, top_p
, n
, stream
. See the full list of supported inputs here: https://docs.litellm.ai/docs/For claude-2
{
"model": "claude-2",
"messages": [
{
"content": "Hello, whats the weather in San Francisco??",
"role": "user"
}
]
}
import requests
import json
# TODO: use your URL
url = "http://localhost:5000/chat/completions"
payload = json.dumps({
"model": "gpt-3.5-turbo",
"messages": [
{
"content": "Hello, whats the weather in San Francisco??",
"role": "user"
}
]
})
headers = {
'Content-Type': 'application/json'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
Responses from the server are given in the following format. All responses from the server are returned in the following format (for all LLM models). More info on output here: https://docs.litellm.ai/docs/
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
"role": "assistant"
}
}
],
"created": 1691790381,
"id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
"model": "gpt-3.5-turbo-0613",
"object": "chat.completion",
"usage": {
"completion_tokens": 41,
"prompt_tokens": 16,
"total_tokens": 57
}
}
git clone https://github.com/BerriAI/liteLLM-proxy
pip install requirements.txt
os.environ['LITELLM_PROXY_MASTER_KEY]` = "YOUR_LITELLM_PROXY_MASTER_KEY"
or
set LITELLM_PROXY_MASTER_KEY in your .env file
os.environ['OPENAI_API_KEY]` = "YOUR_API_KEY"
or
set OPENAI_API_KEY in your .env file
python main.py
Quick Start: Deploy on Railway
GCP
, AWS
, Azure
This project includes a Dockerfile
allowing you to build and deploy a Docker Project on your providers
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。