🪧 Poster
Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation
Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary
ICLR 2026
Proposed a Bayesian framework that estimates models' success probabilities with quantified uncertainty, yielding more reliable rankings and enabling categorical evaluation of LLMs.