CacheBlend

CacheBlend is a technique that allows you to graft together the KV caches of multiple different queries to avoid recomputing the prefill of mostly identical prefixes.¹

It specifically targets use cases like RAG, where you have huge chunks of context that get grafted into a prompt without any particularly strong semantic connection to the rest of the prompt. That is, RAG results are positionally independent within the prompt, so their keys and values should be also positionally independent when it comes to computing attention.

[2405.16444] CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion ↩

Glenn's Digital Garden

Explorer

CacheBlend

Graph View

Backlinks

Glenn's Digital Garden

Explorer

CacheBlend

Footnotes

Graph View

Backlinks