You can run vqa_demo.py to implement the image Q&A, which uses MiniGPT-4 for generating answers and then GPTCache to cache the answers.
Note that you need to make sure that minigpt4 and gptcache are successfully installed, and move the vqa_demo.py file to the MiniGPT-4 directory.
$ python vqa_demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0
The above command will use the exact match cache, i.e. map cache management method. When you ask the same image and question, it will hit the cache directly and return the answer quickly.
If you want to use similar search cache, you can run the following command to set map
to False
, which will use sqlite3 and faiss to manage the cache to search for similar images and questions in the cache.
You can also set
dir
to your workspace directory.
$ python vqa_demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0 --dir /path/to/workspace --no-map
embedding
functionPlease note that not all data managers are compatible with an embedding function.
Nothing to do. Only map data
manager can be configured for use.
def to_embeddings(data, **kwargs):
return data
When creating an Embedding object, the model will be loaded. It is important to remember to pass the dimension to the data manager.
from gptcache.core import cache, Config
from gptcache.manager import get_data_manager, CacheBase, VectorBase
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
from gptcache.embedding import Onnx
onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))
cache.init(embedding_func=onnx.to_embeddings,
data_manager=data_manager,
similarity_evaluation=SearchDistanceEvaluation(),
)
cache.set_openai_key()
The usage of the following models is similar to the above.
from gptcache.embedding import OpenAI
openai = OpenAI()
# openai.dimension
# openai.to_embeddings
from gptcache.embedding import Huggingface
huggingface = Huggingface()
# huggingface.dimension
# huggingface.to_embeddings
from gptcache.embedding import Cohere
cohere = Cohere()
# cohere.dimension
# cohere.to_embeddings
from gptcache.embedding import SBERT
sbert = SBERT()
# sbert.dimension
# sbert.to_embeddings
from gptcache.embedding import FastText
fast_text = FastText()
# fast_text.dimension
# fast_text.to_embeddings
from gptcache.embedding import PaddleNLP
paddlenlp = PaddleNLP()
# paddlenlp.dimension
# paddlenlp.to_embeddings
The function has two parameters: the preprocessed string and parameters reserved for user customization. To acquire these parameters, a similar method to the one above is used: kwargs.get("embedding_func", {})
.
def to_embeddings(data, **kwargs):
return data
class Cohere:
def __init__(self, model: str="large", api_key: str=None, **kwargs):
self.co = cohere.Client(api_key)
self.model = model
if model in self.dim_dict():
self.__dimension = self.dim_dict()[model]
else:
self.__dimension = None
def to_embeddings(self, data):
if not isinstance(data, list):
data = [data]
response = self.co.embed(texts=data, model=self.model)
embeddings = response.embeddings
return np.array(embeddings).astype('float32').squeeze(0)
@property
def dimension(self):
if not self.__dimension:
foo_emb = self.to_embeddings("foo")
self.__dimension = len(foo_emb)
return self.__dimension
@staticmethod
def dim_dict():
return {
"large": 4096,
"small": 1024
}
Note that if you intend to use the model, it should be packaged with a class. The model will be loaded when the object is created to avoid unnecessary loading when not in use. This also ensures that the model is not loaded multiple times during program execution.
data manager
classMapDataManager, default
Store all data in a map data structure, using the question as the key.
from gptcache.manager import get_data_manager
from gptcache import cache
data_manager = get_data_manager()
cache.init(data_manager=data_manager)
cache.set_openai_key()
Cached storage and Vector store
The user's question and answer data can be stored in a general database such as SQLite or MySQL, while the vector obtained through the question text embedding is stored in a separate vector database.
from gptcache import cache
from gptcache.manager import get_data_manager, CacheBase, VectorBase
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
import numpy as np
d = 8
def mock_embeddings(data, **kwargs):
return np.random.random((d, )).astype('float32')
cache_base = CacheBase('sqlite')
vector_base = VectorBase('faiss', dimension=d)
data_manager = get_data_manager(cache_base, vector_base)
cache.init(embedding_func=mock_embeddings,
data_manager=data_manager,
similarity_evaluation=SearchDistanceEvaluation(),
)
cache.set_openai_key()
Support general database
Support vector database
Custom Store
First, you need to implement two interfaces, namely CacheStorage
and VectorBase
, and then create the corresponding data manager through the get_data_manager
method.
Reference: CacheStorage sqlalchemy VectorBase Faiss
from gptcache import cache
from gptcache.manager import get_data_manager
data_manager=get_data_manager(cache_base=CustomCacheStore(), vector_base=CustomVectorStore())
cache.init(data_manager=data_manager)
similarity evaluation
interfaceExact match between two questions, currently only available for map data manager.
from gptcache import cache
from gptcache.similarity_evaluation.exact_match import ExactMatchEvaluation
cache.init(
similarity_evaluation=ExactMatchEvaluation(),
)
cache.set_openai_key()
Using search distance to evaluate sentences pair similarity.
from gptcache import cache
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
cache.init(
similarity_evaluation=SearchDistanceEvaluation(),
)
cache.set_openai_key()
Using ONNX model to evaluate sentences pair similarity.
from gptcache import cache
from gptcache.similarity_evaluation.onnx import OnnxModelEvaluation
cache.init(
similarity_evaluation=OnnxModelEvaluation(),
)
cache.set_openai_key()
Using Numpy norm to evaluate sentences pair similarity.
from gptcache import cache
from gptcache.similarity_evaluation.np import NumpyNormEvaluation
cache.init(
similarity_evaluation=NumpyNormEvaluation(),
)
cache.set_openai_key()
Custom similarity evaluation
To meet the requirements, you will need to implement the SimilarityEvaluation
interface, which consists of two methods: evaluation
and range
.
Reference: similarity evaluation dir
cache_enable_func: determines whether to use the cache.
Among them, args
and kwargs
represent the original request parameters. If the function returns True, the cache is enabled.
You can use this function to ensure that the cache is not enabled when the length of the question is too long, as the likelihood of caching the result is low in such cases.
def cache_all(*args, **kwargs):
return True
pre_embedding_func: extracts key information from the request and preprocesses it to ensure that the input information for the encoder module's embedding function is simple and accurate.
The data
parameter represents the original request dictionary object, while the kwargs
parameter is used to obtain user-defined parameters. By using kwargs.get("pre_embedding_func", {})
, the main purpose is to allow users to pass additional parameters at a certain stage of the process.
For example, it may extract only the last message in the message array of the OpenAI request body, or the first and last messages in the array.
def last_content(data, **kwargs):
return data.get("messages")[-1]["content"]
def all_content(data, **kwargs):
s = ""
messages = data.get("messages")
for i, message in enumerate(messages):
if i == len(messages) - 1:
s += message["content"]
else:
s += message["content"] + "\n"
return s
config: includes cache-related configurations, which currently consist of the following: log_time_func
, similarity_threshold
.
embedding
and search
functions.next_cache: This points to the next cache object, which is useful for implementing multi-level cache functions.
from gptcache import cache, Cache
from gptcache.manager import get_data_manager
bak_cache = Cache()
bak_data_file = "data_map_bak.txt"
bak_cache.init(data_manager=get_data_manager(data_path=bak_data_file))
cache.init(data_manager=get_data_manager(), next_cache=bak_cache)
onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))
one_cache = Cache()
one_cache.init(embedding_func=onnx.to_embeddings,
data_manager=data_manager,
evaluation_func=pair_evaluation,
config=Config(
similarity_threshold=1,
),
)
question = "what do you think about chatgpt"
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": question}
],
cache_obj=one_cache
)
question = "what do you think about chatgpt"
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": question}
],
cache_context={
"pre_embedding_func": {},
"embedding_func": {},
"search_func": {},
"evaluation_func": {},
"save_func": {},
}
)
question = "what do you think about chatgpt"
openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": question}
],
cache_skip=True
)
For more details, please refer to: API Reference
Session can isolate the context of each connection, and can also filter the results after recall, and if not satisfied will re-request rather than return the cache results directly.
First we need to initialize the cache:
from gptcache import cache
cache.init()
cache.set_openai_key()
Then we can set the session parameter for each request.
with
methodfrom gptcache.session import Session
with Session() as session:
response = openai.ChatCompletion.create(
model='gpt-3.5-turbo',
messages=[
{
'role': 'user',
'content': "what's github"
}],
session=session
)
The
with
method will delete the session data directly on exit, if you don't want to delete data in your sesion, you can run the following code but withoutsession.drop()
.If you are using
LangChainLLMs
, you can set the session for current LLM, e.g.llm = LangChainLLMs(llm=OpenAI(temperature=0), session=session)
. Or you can run the llm with specific session, e.g.llm = LangChainLLMs(llm=OpenAI(temperature=0)
, and runllm(question, session=session)
.
You can customize the name
of the sesion, and the check_hit_func
method to check if a hit is satisfied, which method has four parameters:
cur_session_id
: the name of the current session
cache_session_ids
: a list of session names for caching the same content if you are using map as a data management method. Otherwise a list of session names for similar content and same answer
cache_question
: a list with one question which same as the you asked if you use a map as a data management method. Otherwise it is a list that is similar to the question you asked with the same answer, and it is correspondence with cache_session_ids
cache_answer
: the content of the cached answer
The default
check_hit_func
returnscur_session_id not in cache_session_ids
, which means that the answers returned cannot be in the same session.
In the following code, my_check_hit
is defined to check if the cached answer contains "GitHub", and return True
if it does, then gptcache will continue with the subsequent evaluation operations, and if it does not contain it will return False
and will re-run the request.
from gptcache.session import Session
def my_check_hit_func(cur_session_id, cache_session_ids, cache_questions, cache_answer):
if "GitHub" in cache_answer:
return True
return False
session = Session(name="my-session", check_hit_func=my_check_hit_func)
response = openai.ChatCompletion.create(
model='gpt-3.5-turbo',
messages=[
{
'role': 'user',
'content': "what's github"
}],
session=session
)
# session.drop() # Optional
And you can also run data_manager.list_sessions
to list all the sessions.
GPTCache now supports building a server with caching and conversation capabilities. You can start a customized GPTCache service within a few lines.
Once you have GPTCache installed, you can start the server with following command:
$ gptcache_server -s [HOST] -p [PORT] -d [CACHE_DIRECTORY] -f [CACHE_CONFIG_FILE]
The args are optional:
gptcache_data
folder.GPTCache server configuration
You can config the server via a YAML file, here is an example config yaml:
embedding:
onnx
embedding_config:
# Set embedding model params here
storage_config:
data_dir:
gptcache_data
manager:
sqlite,faiss
vector_params:
# Set vector storage related params here
evaluation:
distance
evaluation_config:
# Set evaluation metric kws here
pre_function:
get_prompt
post_function:
first
config:
similarity_threshold: 0.8
# Set other config here
similarity_threshold
Use the docker to start the GPTCache server
Also, you can start the service in a docker container:
$ docker pull zilliz/gptcache:latest
$ docker run -p 8000:8000 -it zilliz/gptcache:latest
$ docker run -p 8000:8000 -it gptcache:v0 gptcache_server -s 0.0.0.0 -p 8000 -f gptcache.yml
Interact with the server
GPTCache supports two ways of interaction with the server:
put the data to cache
curl -X 'POST' \
'http://localhost:8000/put' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"prompt": "Hi",
"answer": "Hi back"
}'
get the data from the cache
curl -X 'POST' \
'http://localhost:8000/get' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"prompt": "Hi"
}'
>>> from gptcache.client import Client
>>> client = Client(uri="http://localhost:8000")
>>> client.put("Hi", "Hi back")
200
>>> client.get("Hi")
'Hi back'
The benchmark script about the Sqlite + Faiss + ONNX
Test data source: Randomly scrape some information from the webpage (origin), and then let chatgpt produce corresponding data (similar).
similar
to search and get the same result as origin
similar
to search and get the different result as origin
data file: mock_data.json similarity evaluation func: pair_evaluation (search distance)
threshold | average time | positive | negative | fail count |
---|---|---|---|---|
0.95 | 0.12s | 425 | 25 | 549 |
0.9 | 0.23s | 804 | 77 | 118 |
0.8 | 0.26s | 904 | 92 | 3 |
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。