To see the need for Horizontal scaling of the cache, we need to see how GPTCache works by default, using in-memory cache. Let's look at a high level break down of the steps involved. More detailed flow is in the diagram below.
Above diagram depicts how for a given query search operation will determine whether cache exists or not.
It happens in following steps:
Although, In-memory eviction manager work great for a single node deployment. It won't work in a multi-node deployment scenario since, cache information is not shared across nodes.
In the diagrams above, you can observe that the only difference between the two flows is the Eviction Manager.
The Distributed Eviction Manager uses Distributed Cache database such as Redis to maintain cache information.
Now that the cache is maintained in a distributed manner, the cache information is shared, and it can be made available across all nodes. This allows a multi-node GPTCache deployment to scale horizontally.
The diagram below depicts how a multi-node GPTCache deployment can be configured to enable horizontal scaling.
Following example shows how to use GPTCache with redis
as eviction manager.
from gptcache import Cache
from gptcache.embedding import Onnx
from gptcache.manager import manager_factory
onnx = Onnx()
data_manager = manager_factory("redis,faiss",
eviction_manager="redis",
scalar_params={"url": "redis://localhost:6379"},
vector_params={"dimension": onnx.dimension},
eviction_params={"maxmemory": "100mb",
"policy": "allkeys-lru",
"ttl": 1}
)
cache = Cache()
cache.init(data_manager=data_manager)
question = "What is github?"
answer = "Online platform for version control and code collaboration."
embedding = onnx.to_embeddings(question)
cache.import_data([question], [answer], [embedding])
GPTCache-Server can be configured in similar way using YAML configuration.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。