π Ranking Reasoning LLMs under Test-Time Scaling Accepted to ACL 2026 Main

Hi, I'm Mohsen!
and I love math β€οΈ. I work on these things.
I'm always excited to start new collaborations, especially when it's something new to me. Feel free to reach out by email; you'll find it in the footer.
News
View allπ Quantize What Counts: More for Keys, Less for Values Accepted to ACL 2026 Findings
π² Donβt Pass@π: A Bayesian Framework for Large Language Model Evaluation Accepted to ICLR 2026
π¦ Julia & Python pkgs for the Bayesian framework are out!
π¦ vLLM Γ DFloat11: run your model with 30% less memory!
β¨ 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float Accepted to NeurIPS 2025
Recent Papers
View allDonβt Pass@π: A Bayesian Framework for Large Language Model Evaluation
Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary β’
Quantize What Counts: More For Keys, Less For Values βοΈπππ’
Mohsen Hariri, Alan Luo, Weicong Chen, Shaochen Zhong, Tianyi Zhang, Qifan Wang, Xia Hu, Xiaotian Han, Vipin Chaudhary β’
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Tianyi Zhang, Mohsen Hariri, Shaochen Zhong, Vipin Chaudhary, Yang Sui, Xia Hu, Anshumali Shrivastali β’
Recent Posters
View allRanking Reasoning LLMs under Test-Time Scaling
ACL 2026 Mainβ’ Mohsen Hariri, Michael Hinczewski, Jing Ma, Vipin Chaudhary
Ranking reasoning LLMs under repeated sampling, comparing 72 ranking methods across four Olympiad-style math benchmarks and packaging them in Scorio.
Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation
ICLR 2026β’ Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary
Proposed a Bayesian framework that estimates models' success probabilities with quantified uncertainty, yielding more reliable rankings and enabling categorical evaluation of LLMs.
Recent Posts
View allEntropy of bfloat16 During Training: How Optimizers Shape Weight Distributions
β’ Training, Information Theory, Optimizers
Entropy of bfloat16: 8 Bits Are Doing 2.6 Bits of Work
β’ LLMs, Information Theory, Efficiency
Simulating LLM Evaluation Datasets Using Psychometric Models
β’ Simulation, LLMs, Reasoning