Slide
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
NeurIPS 2025 presentation on Dynamic-Length Float (DFloat11/DF11): a lossless format that Huffman-codes BFloat16 exponents down to ~11 bits, cutting model size ~30% with bit-for-bit identical outputs and a GPU kernel that makes compressed inference fast.