Donβt Pass@π: A Bayesian Framework for Large Language Model Evaluation
Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary β’

and I love math β€οΈ.
M.Sc. Thesis in Computer Science
A principled Bayesian framework that replaces Pass@k with posterior estimates, credible intervals, and stable rankings for LLM evaluation