Posts

Entropy of bfloat16 During Training: How Optimizers Shape Weight Distributions

November 17, 2025

During training, bfloat16 exponent bits evolve differently depending on the optimizer. Adam increases entropy, SGD decreases it, while AdamW consistently produces the ~2.6 bits observed in trained LLMs.

Compression Training Information Theory Optimizers Statistics Compression

Entropy of bfloat16: 8 Bits Are Doing 2.6 Bits of Work

October 28, 2025

BFloat16 uses 8 bits to store exponents, but those 8 bits carry only about 2.6 bits of actual information in trained neural networks. Regardless of the initialization and training recipe.

Compression LLMs Information Theory Efficiency Statistics Compression

Simulating LLM Evaluation Datasets Using Psychometric Models

October 23, 2025

Explore how Item Response Theory (IRT) and other psychometric models can simulate and analyze LLM evaluation datasets. Learn how difficulty, discrimination, and guessing parameters reveal model reasoning patterns, with interactive examples across multiple reading levels.

Statistics Simulation LLMs Reasoning

Simulating LLM Answers to Evaluation Datasets

October 22, 2025

Explore how simulating LLM responses to evaluation datasets with stochastic sampling is like flipping biased coins—revealing variability, bias, and the importance of multiple trials for reliable benchmarking.

Statistics LLMs Simulation Inference