Ranking Reasoning LLMs under Test-Time Scaling
ACL 2026 Main• Mohsen Hariri, Michael Hinczewski, Jing Ma, Vipin Chaudhary
Ranking reasoning LLMs under repeated sampling, comparing 72 ranking methods across four Olympiad-style math benchmarks and packaging them in Scorio.

