# LMcache
**Repository Path**: liuk311/LMcache
## Basic Information
- **Project Name**: LMcache
- **Description**: 1111111111111111111
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: dev
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-06-12
- **Last Updated**: 2025-06-13
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
Redis for LLMs - Infinite and Ultra-Fast
----
LMCache is an **LLM** serving engine extension to **reduce TTFT** and **increase throughput**, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations, including (GPU, CPU DRAM, Local Disk), LMCache reuses the KV caches of **_any_** reused text (not necessarily prefix) in **_any_** serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.
By combining LMCache with vLLM, LMCache achieves 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.
Try LMCache with pre-built vllm docker images [here](https://docs.lmcache.ai/developer_guide/docker_file.html).
# 🚀 Performance snapshot

# 💻 Installation and Quickstart
Please refer to our detailed documentation for [LMCache V1](https://docs.lmcache.ai/getting_started/installation.html#install-from-source-v1) and [LMCache V0](https://docs.lmcache.ai/getting_started/installation.html#install-from-source-v0)
# Interested in Connecting?
Fill out the interest form or [drop an email](contact@lmcache.ai), and our team will reach out to you!
[Google Form](https://forms.gle/mQfQDUXbKfp2St1z7)
# 🛣️ News and Milestones
- [x] LMCache V1 with vLLM integration with following features is live 🔥
* High performance CPU KVCache offloading
* Disaggregated prefill
* P2P KVCache sharing
- [x] LMCache is supported in the [vLLM production stack ecosystem](https://github.com/vllm-project/production-stack/tree/main)
- [x] User and developer documentation
- [x] Stable support for non-prefix KV caches
- [x] Support installation through pip install and integrate with latest vLLM
- [x] First release of LMCache
# 📖 Blogs and documentations
Our latest [blog posts](https://lmcache.github.io) and the [documentation](https://docs.lmcache.ai/) pages are available online
# Community meeting
The community meeting for LMCache is hosted weekly.
Meeting Details:
- Tuesdays at 9:00 AM PT – [Add to Calendar](https://drive.google.com/file/d/15Xz8-LtpBQ5QgR7KrorOOyfuohCFQmwn/view?usp=drive_link)
- Tuesdays at 6:30 PM PT – [Add to Calendar](https://drive.google.com/file/d/1WMZNFXV24kWzprDjvO-jQ7mOY7whqEdG/view?usp=drive_link)
Meetings **alternate weekly** between the two times. All are welcome to join!
## Contributing
We welcome and value any contributions and collaborations. Please check out [CONTRIBUTING.md](CONTRIBUTING.md) for how to get involved.
## Citation
If you use LMCache for your research, please cite our papers:
```
@inproceedings{liu2024cachegen,
title={Cachegen: Kv cache compression and streaming for fast large language model serving},
author={Liu, Yuhan and Li, Hanchen and Cheng, Yihua and Ray, Siddhant and Huang, Yuyang and Zhang, Qizheng and Du, Kuntai and Yao, Jiayi and Lu, Shan and Ananthanarayanan, Ganesh and others},
booktitle={Proceedings of the ACM SIGCOMM 2024 Conference},
pages={38--56},
year={2024}
}
@article{cheng2024large,
title={Do Large Language Models Need a Content Delivery Network?},
author={Cheng, Yihua and Du, Kuntai and Yao, Jiayi and Jiang, Junchen},
journal={arXiv preprint arXiv:2409.13761},
year={2024}
}
@article{yao2024cacheblend,
title={CacheBlend: Fast Large Language Model Serving with Cached Knowledge Fusion},
author={Yao, Jiayi and Li, Hanchen and Liu, Yuhan and Ray, Siddhant and Cheng, Yihua and Zhang, Qizheng and Du, Kuntai and Lu, Shan and Jiang, Junchen},
journal={arXiv preprint arXiv:2405.16444},
year={2024}
}
```
## License
This project is licensed under Apache License 2.0. See the [LICENSE](LICENSE) file for details.