Don’t Pass@𝑘: A Bayesian Framework for Large Language Model Evaluation
Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary
Don’t Pass@k introduces a Bayesian approach to language model evaluation, estimating Bayes@k with posterior uncertainty, credible intervals, and rubric-aware scoring.