Deepseek V4 explained, and why it matters to your wallet

DeepSeek V4 — an open weight model that gives you the state-of-the-art intelligence, while potentially gives you much cheaper token price than its preceding model, DeepSeek V3.2. But how does DeepSeek v4 does that? Pre-requisite: attention, KV caches, and why KV cache is the key that affects token pricing To know why DeepSeek V4 can […]

LMCache: A Journey

GTC wrapped up a month ago.  Our open-source KV cache management library, LMCache, was shown in Jensen Huang’s keynote, was spotlighted by NVIDIA SVP Kevin Deierling, I was invited to speak at the first-ever industry KV cache tutorial, and mentioned in talk after talk by industry partners. At one point, someone even came to our […]