Hi, I'm Mohsen!

and I love math ❤️. I work on these things.
I'm always excited to start new collaborations, especially when it's something new to me. Feel free to reach out by email; you'll find it in the footer.

Google Scholar GitHub LinkedIn CV/Résumé

Papers Posts Posters Slides Research Interests

News

Apr 6, 2026

🎉 Ranking Reasoning LLMs under Test-Time Scaling Accepted to ACL 2026 Main

Apr 6, 2026

🎉 Quantize What Counts: More for Keys, Less for Values Accepted to ACL 2026 Findings

Jan 25, 2026

🎲 Don’t Pass@𝑘: A Bayesian Framework for Large Language Model Evaluation Accepted to ICLR 2026

Oct 18, 2025

📦 Julia & Python pkgs for the Bayesian framework are out!

Oct 15, 2025

📦 vLLM × DFloat11: run your model with 30% less memory!

Sep 17, 2025

✨ 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float Accepted to NeurIPS 2025

Recent Papers

Don’t Pass@𝑘: A Bayesian Framework for Large Language Model Evaluation

Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary • Oct 21, 2025

Quantize What Counts: More For Keys, Less For Values ☝️🔑👇🔢

Mohsen Hariri, Alan Luo, Weicong Chen, Shaochen Zhong, Tianyi Zhang, Qifan Wang, Xia Hu, Xiaotian Han, Vipin Chaudhary • Oct 20, 2025

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Tianyi Zhang, Mohsen Hariri, Shaochen Zhong, Vipin Chaudhary, Yang Sui, Xia Hu, Anshumali Shrivastali • Oct 19, 2025

Recent Posters

Ranking Reasoning LLMs under Test-Time Scaling

ACL 2026 Main•Apr 6, 2026 Mohsen Hariri, Michael Hinczewski, Jing Ma, Vipin Chaudhary

Ranking reasoning LLMs under repeated sampling, comparing 72 ranking methods across four Olympiad-style math benchmarks and packaging them in Scorio.

Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation

ICLR 2026•Jan 25, 2026 Mohsen Hariri, Amirhossein Samandar, Michael Hinczewski, Vipin Chaudhary

Proposed a Bayesian framework that estimates models' success probabilities with quantified uncertainty, yielding more reliable rankings and enabling categorical evaluation of LLMs.

Recent Posts

Entropy of bfloat16 During Training: How Optimizers Shape Weight Distributions

Nov 17, 2025 • Training, Information Theory, Optimizers

Entropy of bfloat16: 8 Bits Are Doing 2.6 Bits of Work

Oct 28, 2025 • LLMs, Information Theory, Efficiency

Simulating LLM Evaluation Datasets Using Psychometric Models

Oct 23, 2025 • Simulation, LLMs, Reasoning

Recent Slides

Python Environments

May 21, 2026 • Tools, Tutorial

Python environments, how to create and reproduce them, and when to use pip, conda, micromamba, uv, pipx, lockfiles, and containers.

Virtual Agentic Lab!

Jan 18, 2026 • Agents, LLMs

SCIPE Workshop on Large Language Models • Final Presentation

LLM Research Directions

Jan 18, 2026 • LLMs, Reasoning Models, Test-Time Scaling

SCIPE Workshop on LLMs - Day 3