Quantization

Tag: Quantization

6 items tagged with "Quantization"

Serving Reasoning LLMs Efficiently and Reliably [No Anime]

July 6, 2026

Serving reasoning LLMs efficiently and reliably: lossless DFloat11 compression, KV-cache quantization, and Bayes@N evaluation and ranking under test-time scaling.

Slide

Serving Reasoning LLMs Efficiently and Reliably

July 6, 2026

Serving reasoning LLMs efficiently and reliably: lossless DFloat11 compression, KV-cache quantization, and Bayes@N evaluation and ranking under test-time scaling.

Slide

Quantize What Counts: More for Keys, Less for Values

June 12, 2026

ACL 2026 presentation on Quantize What Counts: More for Keys, Less for Values, explaining key-value norm disparity, key-prioritized quantization, and practical KV-cache compression guidance.

Poster

Quantize What Counts: More for Keys, Less for Values

Mohsen Hariri, Alan Luo, Weicong Chen, Tianyi Zhang, Qifan Wang, Xiaotian Han, Vipin Chaudhary

June 4, 2026

A geometry-driven mixed-precision KV-cache quantization poster showing that keys carry more information than values, so key-favored bit allocation preserves accuracy while reducing memory.

Slide

Test-Time Scaling Under Budget

November 21, 2025

M.Sc. Thesis in Computer Science

Paper

Quantize What Counts: More For Keys, Less For Values ☝️🔑👇🔢

Mohsen Hariri, Alan Luo, Weicong Chen, Shaochen Zhong, Tianyi Zhang, Qifan Wang, Xia Hu, Xiaotian Han, Vipin Chaudhary

October 20, 2025

Key-favored KV-cache quantization for LLMs: theory shows keys have larger norms and should get more bits; empirics show 4b-K/2b-V preserves up to 98.3% accuracy while cutting memory.