slshen – LMCache

LMCache Multi-node P2P CPU Memory Sharing & Control: From Experimental Feature to Production

Baolong Mao (Tencent), Chunxiao Zheng (Tencent), Weishu Deng (Tensormesh), Darren Peng (Tensormesh), Samuel Shen (Tensormesh) What is P2P and what does it promise? In this blog post, we will go over: Most production vLLM deployments run multiple identical instances behind a load balancer. Each instance builds its own KV cache only from the traffic it […]

Author: slshen

LMCache Multi-node P2P CPU Memory Sharing & Control: From Experimental Feature to Production