ποΈ Slide
Test-Time Scaling Under Budget
M.Sc. Thesis in Computer Science
4 items tagged with "Reasoning"
M.Sc. Thesis in Computer Science
A principled Bayesian framework that replaces Pass@k with posterior estimates, credible intervals, and stable rankings for LLM evaluation
A Bayesian framework for evaluating large language models that replaces unstable Pass@k metrics with robust posterior estimates and credible intervals. This method improves sample efficiency, supports graded outcomes, and enables statistically sound model comparisons.