Test-Time Scaling Under Budget
M.Sc. Thesis in Computer Science
8 items tagged with "LLMs"
M.Sc. Thesis in Computer Science
A principled Bayesian framework that replaces Pass@k with posterior estimates, credible intervals, and stable rankings for LLM evaluation
A Bayesian framework for evaluating large language models that replaces unstable Pass@k metrics with robust posterior estimates and credible intervals. This method improves sample efficiency, supports graded outcomes, and enables statistically sound model comparisons.
Key-favored KV-cache quantization for LLMs: theory shows keys have larger norms and should get more bits; empirics show 4b-K/2b-V preserves up to 98.3% accuracy while cutting memory.
DFloat11 compresses LLMs to 70% of their original size while maintaining bit-for-bit identical outputs. A lossless compression framework with efficient GPU inference that enables running Llama 3.1 405B on a single node.