Slide
Quantize What Counts: More for Keys, Less for Values
ACL 2026 presentation on Quantize What Counts: More for Keys, Less for Values, explaining key-value norm disparity, key-prioritized quantization, and practical KV-cache compression guidance.
3 items tagged with "KV Cache"
ACL 2026 presentation on Quantize What Counts: More for Keys, Less for Values, explaining key-value norm disparity, key-prioritized quantization, and practical KV-cache compression guidance.
A geometry-driven mixed-precision KV-cache quantization poster showing that keys carry more information than values, so key-favored bit allocation preserves accuracy while reducing memory.
Key-favored KV-cache quantization for LLMs: theory shows keys have larger norms and should get more bits; empirics show 4b-K/2b-V preserves up to 98.3% accuracy while cutting memory.