Scorio

A Bayesian framework for Large Language Model evaluation and ranking. Scorio implements test-time scaling metrics, including Bayes@N, with uncertainty quantification for evaluating and ranking models.

Papers