
Hi, I'm Mohsen!
and I love math β€οΈ. I work on these things.
I'm always excited to start new collaborations, especially when it's something new to me. Feel free to reach out by email; you'll find it in the footer.
News
View allRecent Papers
View allDonβt Pass@π: A Bayesian Framework for Large Language Model Evaluation
Quantize What Counts: More For Keys, Less For Values βοΈπππ’
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Recent Posters
View allRanking Reasoning LLMs under Test-Time Scaling
Ranking reasoning LLMs under repeated sampling, comparing 72 ranking methods across four Olympiad-style math benchmarks and packaging them in Scorio.
Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation
Proposed a Bayesian framework that estimates models' success probabilities with quantified uncertainty, yielding more reliable rankings and enabling categorical evaluation of LLMs.
Recent Posts
View allEntropy of bfloat16 During Training: How Optimizers Shape Weight Distributions
Entropy of bfloat16: 8 Bits Are Doing 2.6 Bits of Work
Simulating LLM Evaluation Datasets Using Psychometric Models
Recent Slides
View allPython Environments
Python environments, how to create and reproduce them, and when to use pip, conda, micromamba, uv, pipx, lockfiles, and containers.
Virtual Agentic Lab!
SCIPE Workshop on Large Language Models β’ Final Presentation
LLM Research Directions
SCIPE Workshop on LLMs - Day 3