The first open-source Knowledge Delivery Network (KDN) that accelerates LLM applications up to 8x faster, at 8x lower cost.
Authors: Yuhan Liu, Hanchen Li, Yihua Cheng, Siddhant Ray, Yuyang Huang, Qizheng Zhang, Kuntai Du, Jiayi Yao, Shan Lu, Ganesh Ananthanarayanan, Michael Maire, Henry Hoffmann, Ari Holtzman, Junchen Jiang
LMCache scales effortlessly, eliminating the need for complex GPU request routing.
Our novel compression techniques reduce the cost of storing and delivering KV caches.
Our unique streaming and decompression methods minimize latency, ensuring fast responses.
Seamless integration with popular LLM serving engines like vLLM and TGI.
LMCache enhances the quality of LLM inferences through offline content upgrades.