LMCache Logo

Accelerating the Future of AI,

One Cache at a Time

The first open-source Knowledge Delivery Network (KDN) that accelerates LLM applications up to 8x faster, at 8x lower cost.

Prompt Caching
Enable fast, uninterrupted interactions with AI chatbots and document processing tools by caching long conversational histories for quick retrieval.
Without LMCache GIF

Without LMCache: Slow Response

With LMCache GIF

With LMCache: Xx Faster Response

Fast RAG
Enhance the speed and accuracy of RAG queries by dynamically combining stored KV caches from various text chunks, perfect for enterprise search engines and AI-based document processing.
Without LMCache GIF

Without LMCache: Slow Response

With LMCache GIF

With LMCache: Xx Faster Response

Advantages of LMCache

Scalability

LMCache scales effortlessly, eliminating the need for complex GPU request routing.

Cost Efficiency

Our novel compression techniques reduce the cost of storing and delivering KV caches.

Speed

Our unique streaming and decompression methods minimize latency, ensuring fast responses.

Cross-Platform

Seamless integration with popular LLM serving engines like vLLM and TGI.

Quality

LMCache enhances the quality of LLM inferences through offline content upgrades.