Compression

Tag: Compression

4 items tagged with "Compression"

Slide

Test-Time Scaling Under Budget

November 21, 2025

M.Sc. Thesis in Computer Science

Post

Entropy of bfloat16 During Training: How Optimizers Shape Weight Distributions

November 17, 2025

During training, bfloat16 exponent bits evolve differently depending on the optimizer. Adam increases entropy, SGD decreases it, while AdamW consistently produces the ~2.6 bits observed in trained LLMs.

Post

Entropy of bfloat16: 8 Bits Are Doing 2.6 Bits of Work

October 28, 2025

BFloat16 uses 8 bits to store exponents, but those 8 bits carry only about 2.6 bits of actual information in trained neural networks. Regardless of the initialization and training recipe.

Paper

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Tianyi Zhang, Mohsen Hariri, Shaochen Zhong, Vipin Chaudhary, Yang Sui, Xia Hu, Anshumali Shrivastali

October 19, 2025

DFloat11 compresses LLMs to 70% of their original size while maintaining bit-for-bit identical outputs. A lossless compression framework with efficient GPU inference that enables running Llama 3.1 405B on a single node.