
Hi, I'm Mohsen!
and I love math ❤️. I work on these things.
I'm always excited to start new collaborations, especially when it's something new to me. Feel free to reach out by email; you'll find it in the footer.
News
View allRecent Research
View all
Quantize What Counts: More for Keys, Less for Values
A geometry-driven mixed-precision KV-cache quantization poster showing that keys carry more information than values, so key-favored bit allocation preserves accuracy while reducing memory.
Ranking Reasoning LLMs under Test-Time Scaling
Ranking reasoning LLMs under repeated sampling, comparing 72 ranking methods across four Olympiad-style math benchmarks and packaging them in Scorio.
Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation
Proposed a Bayesian framework that estimates models' success probabilities with quantified uncertainty, yielding more reliable rankings and enabling categorical evaluation of LLMs.
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
DFloat11 compresses LLMs to 70% of their original size while maintaining bit-for-bit identical outputs. A lossless compression framework with efficient GPU inference that enables running Llama 3.1 405B on a single node.
Recent Posts
View allEntropy of bfloat16 During Training: How Optimizers Shape Weight Distributions
Entropy of bfloat16: 8 Bits Are Doing 2.6 Bits of Work
Simulating LLM Evaluation Datasets Using Psychometric Models
Recent Slides
View allServing Thinking LLMs Efficiently and Reliably
Serving reasoning LLMs efficiently and reliably: lossless DFloat11 compression, KV-cache quantization, and Bayes@N evaluation and ranking under test-time scaling.
Python Environments
Python environments, how to create and reproduce them, and when to use pip, conda, micromamba, uv, pipx, lockfiles, and containers.
Virtual Agentic Lab!
SCIPE Workshop on Large Language Models • Final Presentation