The first open-source Knowledge Delivery Network (KDN) that accelerates LLM applications up to 8x faster, at 8x lower cost.
LMCache scales effortlessly, eliminating the need for complex GPU request routing.
Our novel compression techniques reduce the cost of storing and delivering KV caches.
Our unique streaming and decompression methods minimize latency, ensuring fast responses.
Seamless integration with popular LLM serving engines like vLLM and TGI.
LMCache enhances the quality of LLM inferences through offline content upgrades.