Ranking Reasoning LLMs under Test-Time Scaling
ACL 2026 presentation on ranking reasoning LLMs under test-time scaling: dense repeated-trial evaluation, Bayes@N as a practical default, low-budget priors, categorical ranking, and the Scorio toolkit.