Mohsen Hariri

Highlights

You are reading a dynamic CV/Résumé! Adjust the slider below to see more or less detail.

Research interests

Machine Learning, Large Language Models, Efficient ML, Quantum Computing

Education

Ph.D. in Computer and Data SciencesCase Western Reserve University

Under the supervision of Prof. Vipin Chaudhary

M.S. in Computer and Data Sciences (GPA: 4.0/4.0) • Case Western Reserve University

Thesis: Test-Time Scaling Under Budget: Reasoning Evaluation and Memory-Efficient LLM Deployment, Supervisor: Prof. Vipin Chaudhary

B.S. in Electrical EngineeringUniversity of Tehran

Thesis: Evaluating and Analysis on Therapeutic Environment's Network Using Simple Network Management Protocol for Fault Detection and Management, Routing and Auto-discovery, Supervisor: Prof. Reza Aghaizadeh Zoroofi

Publications

M Hariri, M Hinczewski, J Ma, V Chaudhary

Ranking Reasoning LLMs under Test-Time ScalingACL 2026 Main

Studies how to rank reasoning LLMs when each question is sampled multiple times (test-time scaling). Formalizes the repeated-trial setting, compares ranking families (metrics, Bayesian, IRT, voting, spectral), and introduces Scorio, an open-source toolkit for stable LLM ranking.

M Hariri, A Samandar, M Hinczewski, V Chaudhary

Don't Pass@k: A Bayesian Framework for Large Language Model EvaluationICLR 2026

Proposed a Bayesian framework that estimates models’ success probabilities with quantified uncertainty, yielding more reliable rankings and enabling categorical evaluation of LLMs.

M Hariri, A Luo, W Chen, T Zhang, Q Wang, X Han, V Chaudhary

Quantize What Counts: More for Keys, Less for ValuesACL 2026 Findings

Keys carry more information than values; consequently, key tensors require a larger quantization bit-width, smaller group sizes, and outlier mitigation (e.g., Hadamard transformation).

M Hariri, V Chaudhary

Geom@k: Fast to Converge, Slow to DriftCOLM 2026

Shows that test-time scaling evaluation should separate discovery from repeated correctness, derives Geom@k and the broader GeoSpectrum@K family from a shared hypergeometric view of fixed-budget metrics, and finds that Geom@2 gives the best convergence and rank-correlation summaries across all six aggregate settings.

M Hariri, M Hinczewski, V Chaudhary

Scorio.jl: A Julia package for ranking stochastic responsesJuliCon 2026

A Julia package for evaluating and ranking stochastic systems from repeated responses using a unified tensor-based framework.

A Yu, M Hariri, K Nakamura, M Yang, X Li, V Chaudhary (equal contribution)

Medical Image Spatial Grounding with Semantic SamplingMICCAI 2026

Evaluates VLM spatial grounding in 3D medical images across modalities and coordinate systems. Introduces MIS-Ground for anatomy-specific failure analysis and MIS-SemSam for improved inference-time grounding without retraining.

B Flannery, T DeSilvio, M Hariri, A Sadri, N Heller, C Weight, S Viswanath

Empirical evaluation of variability and multi-institutional generalizability of deep learning survival models: Application to renal cancer CT scansComputers in Biology and Medicine, 2026

Systematically evaluated how data partitioning, initialization, and augmentation choices affect the robustness and cross-institution generalization of CT-based deep learning survival models.

W Chen, V Singh, Z Rahmani, D Ganguly, M Hariri, S Maxwell, S Gajurel, E Dragowsky, H Djohari, V Chaudhary

K^4-Serve: Robust Streaming Log Anomaly Detection for HPC & AI InfrastructureACM PEARC

Operationalizes K^4 for production log anomaly detection with Kafka-based streaming ingestion, versioned normalization, sliding-window scoring, retraining, and observability, achieving stable deployment on real HPC logs and near-perfect event-level detection with only one false alert.

J Sleiman, G Pillai, P Chirra, N Gandhi, I O Gordon, M Hariri, M E Baker, J Ream, C G Fulmer, S El Ouali, D H Bruining, J A Kurowski, S E Viswanath, F Rieder

Development of a Computed Tomography Enterography Based Radiomics Model to Characterize Inflammation, Fibrosis and Smooth Muscle Thickening in Stricturing Crohn’s DiseaseThe American Journal of Gastroenterology, 2026

Develops and internally validates a CTE radiomics machine-learning model for Crohn's disease strictures, linking three-dimensional texture features to histopathologic inflammation, fibrosis, and smooth muscle hyperplasia/hypertrophy; outperforms radiologist visual scoring and improves characterization when combined with it.

M Hariri, H S Hillsdownley, W Chen, V Chaudhary

Runtime Signals as Correctness Proxies: A Categorical Evaluation Framework for Test-Time ScalingACL 2026 (under review)

Argues that binary pass-based metrics are too coarse for test-time scaling and introduces a categorical Bayesian evaluation framework that scores rubric-defined outcomes with uncertainty; shows that a lightweight model using 19 runtime signals reaches 0.917 AUROC without a judge model and that schema choice can materially change model rankings.

D Ganguly, W Chen, M Hariri, V Singh, S Sankar, S Wang, Y Yang, B Zhang, C Song, A Nemecek, K Ye, O Zafar, E Ayday, J Stubbs, S Iyengar, V Chaudhary

Sequent-Prover: Training Agents for Formal, Checkable SMT-based ReasoningACL 2026 (under review)

Introduces Sequent-Prover, a training recipe for agents that write SMT-LIB programs, query Z3, repair formalizations, and answer with executable evidence. Distills successful solver-guided traces and uses preference optimization to improve reliability, grounding, proof quality, and efficiency across formal and commonsense reasoning benchmarks.

K Ye, Y Zhang, X Li, C Song, M Hariri, V Singh, D Ganguly, V Chaudhary

Condition-Aware Conformal Prediction for LLM-as-Judge ReliabilityACL 2026 (under review)

Introduces CAJ, a condition-aware conformal calibration method that turns each LLM-judge score into a calibrated prediction set and referral signal. Uses judge, criterion, cross-judge variance, and transitivity features to repair hard-cell coverage gaps on SUMMEVAL and improve selective trust on ROSEACU.

M Hariri, V Chaudhary

Success Has a Shape: TailPass@k for Repeated-Sampling Agent EvaluationNeurIPS 2026 (under review)

Introduces TailPass@k, a Bayesian discovery-stability profile for repeated-sampling agent evaluation that reports thresholds from at-least-one success to all-k success, recovers accuracy, Pass@k, and stability as special cases, and adds posterior uncertainty for ranking and operating-point comparisons.

C Guo, M Hariri, V Chaudhary

ModalExit: Modality-Aware Expert Exit for Efficient Unified Vision-Language MoE Model InferenceNeurIPS 2026 (under review)

Introduces ModalExit, a training-free inference framework for unified vision-language MoE models that profiles visual-dominant expert slots, detects visual-token saturation, and uses an entropy gate to skip converged image-token expert computation while keeping MMMU accuracy within the full-model range.

V Singh, W Chen, D Ganguly, Y Zhang, N Wang, S Sankar, M Hariri, A Nemecek, C Song, S Wang, B Zhang, V Yang, E Ayday, J Ma, V Chaudhary

CausalGuard: Conformal Inference under Graph UncertaintyNeurIPS 2026 (under review)

Introduces CausalGuard, a structure-weighted conformal framework for treatment-effect intervals under unknown causal graphs. It combines LLM-derived edge priors, conditional-independence pruning, BIC graph weighting, doubly robust pseudo-outcomes, and aggregate-before-calibrate conformal scoring to maintain finite-sample coverage while reducing overly padded graph-agnostic intervals.

V Singh, D Ganguly, W Chen, S Sahoo, S Sankar, B Zhang, M Hariri, S Wang, O Zafar, C Gagne, V Chaudhary

Reliability-Gated Source Anchoring for Continual Test-Time AdaptationNeurIPS 2026 (under review)

Introduces RMemSafe, a reliability-gated extension of ROID for continual test-time adaptation that uses frozen-source entropy to attenuate source anchoring and agreement terms when the source collapses, yielding a source-agnostic fallback and lower error across matched continual-corruption benchmarks.

S Zhong, J Zhang, H A D Le, W Xie, Y Lu, X Sun, M Hariri, H Liu, G Wang, Z Xu, Z Liu, S Xu, N Xie, L Li, R Chen, R Tang, X Hu, V Chaudhary

Sweeping Promptable Spoofs under the DirtyRAG: A Practical, Query-Blind RAG Attack Done RightNeurIPS 2026 (under review)

Introduces DirtyRAG, a query-blind, benign passage-based RAG attack that is robust to defenses and steerable by prompt. Demonstrates practical exploitation and introduces RAG-ATag, a benchmark for evaluating RAG security.

T DeSilvio, L Bao, B N Parker, B Flannery, M Hariri, P Chirra, M Labbad, S Tang, G M O'Connor, R MacBeth, E Steinhagen, J Willis, E L Marderstein, P Fu, A Carroll, M Crittenden, M J Gough, K H Young, S Krishnamurthi, D Liska, A Gupta, A Purysko, S E Viswanath

Integrated Multi-plane, Multi-region Radiomic Features on Baseline T2-Weighted MRI are Associated with Complete Response to Neoadjuvant Therapies in Rectal CancersEuropean Radiology (under review)

Integrates tumor and proximal-fat radiomic features from axial and coronal baseline T2-weighted MRI to predict complete response to neoadjuvant therapies in rectal cancer; validates the multi-plane, multi-region model across multi-institutional holdout cohorts and shows added value with CEA and clinical T-stage.

L Calandruccio, B Grijalva-Arvizu, P Rodriguez Rivera, M Hariri, E Buss, V Chaudhary

Using automatic speech recognition for scoring speech-in-noise testingWork in progress

T Zhang, M Hariri, S Zhong, Y Sui, V Chaudhary, X Hu, A Shrivastava

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)NeurIPS 2025

DFloat11 losslessly compresses LLM and diffusion-model weights using dynamic-length floating-point encoding with Huffman coding, shrinking memory by about 30% with no accuracy loss.

W Chen, V Singh, Z Rahmani, D Ganguly, M Hariri, V Chaudhary

K^4: Online Log Anomaly Detection Via Unsupervised Typicality LearningIEEE HiPC 2025

K4 reframes log anomaly detection as unsupervised typicality learning. It maps log embeddings to four compact PRDC descriptors (Precision, Recall, Density, Coverage) using k-NN statistics, enabling parser-independent online detection with lightweight scoring and microsecond-level latency.

H Liu, S Zhong, X Sun, M Tian, M Hariri, Z Liu, R Tang, Z Jiang, J Yuan, Y Chuang, L Li, S Choi, R Chen, V Chaudhary, X Hu

LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play EcosystemEMNLP 2025

A backdoor LoRA can be trained once and then merged with multiple task LoRAs while retaining both capabilities, making it a low-tech attack that is particularly dangerous and infectious.

L Calandruccio, M Hariri, E Buss, V Chaudhary

Masked-speech recognition using human and synthetic cloned speechTrends in Hearing, 2025

Voice-clone vs. human speech study in masked-sentence recognition: intelligibility, perceived human-likeness, and voice-similarity measured with listener judgments and ASR.

A Sridharan, T DeSilvio, B Flannery, M Hariri, R Macbeth, B Parker, A Elumalai, J Devi, A Lovato, C Maneiro, A George, A Ganapath, P Deepak, D H Ballard, S E Viswanath

Integrating self-configuring and foundational deep learning segmentation models for identifying the anal sphincter complex and perianal fistulae on pelvic MRISPIE Medical Imaging 2025

Introducing an automated pelvic MRI pipeline combining nnU-Net and MedSAM to segment perianal fistulas and anal sphincter muscles in Crohn’s disease, using annotated patient scans to support interventional guidance and surgical planning.

AS Yu, M Hariri, X Zhang, M Yang, V Chaudhary, X Li

Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2SPIE Medical Imaging 2025

A zero-shot, single-prompt method for 3D knee MRI segmentation was developed using the Segment Anything Model 2 (SAM2). By adapting SAM2 to treat MRI slices as video frames, accurate segmentation was achieved without additional training.

B. Flannery, T. DeSilvio, A. Sadri, M. Hariri, E. Remer, J. Nguyen, S. Viswanath

Spatial attention wavelon network (SpAWN) for survival-based risk stratification in kidney cancers via CTSPIE Medical Imaging 2025

The Spatial Attention Wavelon Network (SpAWN) is introduced for risk stratification of kidney cancers using CT scans. SpAWN uses pre-training spatial attention and wavelon activation functions to improve model interpretability and generalizability.

P. Chirra, J. Sleiman, N. Gandhi, I. Gordon, M. Hariri, M. Baker, R. Ottichilo, D. Bruining, J. Kurowski, S. Viswanath, F. Rieder

Radiomics to Detect Inflammation and Fibrosis on Magnetic Resonance Enterography in Stricturing Crohn’s DiseaseJournal of Crohn’s and Colitis, 2024

Developed a radiomics-based machine-learning model to characterize inflammation and fibrosis in Crohn’s disease strictures using MRE. The model improved diagnostic accuracy compared to radiologist visual scoring, with combined use enhancing performance.

M. Hariri, P. Chirra, M. Patel, T. T. Einat, I. Dayan, A. Tonetti, Y. Baror, T. Barrett, N. Sushentsev, J. D. Kaggie, S. Yuan, D. Wu, B. Yu, Z. Lyu, C. Hsu, W. Wang, S. Krishnamurthi, S. E. Viswanath

Federated Image Quality Assessment of Prostate MRI Scans in a Multi-institutional SettingAACR 2024

This study addresses the challenge of image artifact impacts on the reliability of machine learning models in medical imaging, exacerbated across multiple institutions.

B. Flannery, M. Hariri, T. DeSilvio, A. Sadri, J. Nguyen, E. M. Remer, S. Krishnamurthi, S. E. Viswanath

Deep Learning Based Risk Stratification of Pre-operative CT Scans is Prognostic of Overall Survival in Kidney CancersAACR 2024

A deep learning model was developed to enhance preoperative risk assessment and predict survival in kidney cancer patients through CT scans, aiming to improve treatment decisions and overcome limitations of traditional clinical methods.

L. Bao, T. DeSilvio, B. N. Parker, M. Hariri, P. Chirra, M. Labbad, S. Tang, G. M. O'Connor, E. Steinhagen, J. L. Miller-Ocuin, A. Gupta, E. L. Marderstein, A. Carroll, M. Crittenden, M. J. Gough, S. Krishnamurthi, K. H. Young, S. E. Viswanath

Intra-and Peri-tumoral Radiomic Features are Predictive of Pathologic Response to Multiple Neoadjuvant Therapy Regimen in Rectal Cancers via Pre-treatment MRIAACR 2024

Radiomics from pretreatment MRI were analyzed to predict which rectal cancer patients would respond to neoadjuvant treatments, addressing limitations of traditional staging and biomarker approaches.

M. Rezai, L. Namdari, D. Farsi, N. Ashayeri, M. Naghshbandi, M. Hariri, R. Ghafoury

Virtual Reality as an Acute Pain Reliever During Laceration Repair in Emergency Departments: A Randomized Controlled TrialSaudi Journal of Emergency Medicine

Investigated the effect of virtual reality on reducing pain in adult patients during laceration repair in emergency departments.

Professional experience

Academic

AI Scientist • Department of Computer and Data Sciences, CWRU

• Supports AI and research computing workflows in the ACCESS ecosystem; develops, deploys, and maintains tools, models, and user-facing research infrastructure for scientific and academic users.

• Member of Ohio-SCIPE (Strengthening the Cyberinfrastructure Professionals Ecosystem).

• Mentor for the annual summer AI Research Experience (AIRE '24, AIRE '25).

• Research Associate, Speech and Auditory Research Lab (SpARLab), CWRU.

• Judge for the CWRU Intersections Poster Symposium.

• Reviewer for ICLR, NeurIPS, ICML, ACL, and COLM.

• Best Pitch Award, 2024 CCIR Symposium, Center for Imaging Research.

Co-instructor and co-organizer for SCIPE Workshop on Large Language Models • CWRU

Developed and taught a workshop on reasoning LLMs, retrieval-augmented generation (RAG), and agentic AI; co-managed curriculum design, instructional materials, and workshop delivery.

Co-instructor and co-organizer for CWRU Workshop on Large Language Models • CWRU

Developed and taught a workshop on foundation models, reasoning in LLMs, and language model evaluation; co-managed curriculum design, instructional materials, and workshop delivery.

Machine Learning Researcher • Department of Biomedical Engineering, CWRU

Co-instructor for Introduction to Database Systems (CSDS 341) • CWRU

Designed the final project. Created a template for the course project and a simple build system to introduce Java package management.

Designer and instructor for Big Data and Cloud Computing workshop • Weatherhead School of Management, CWRU

Designed and developed a Big Data and Cloud Computing workshop; wrote instructional code and challenges, and co-taught it alongside Prof. Chaudhary.

Chief Editor of Biotech Magazine • University of Tehran

Official magazine of the Iranian Society of Biomedical Engineering, student branch, University of Tehran.

Head of Student Branch of Biomedical Engineering • University of Tehran

(UT-BME-SB)

Head of Information Committee of Biomedical Engineering • University of Tehran

(UT-BME-SB)

Industry

Software Developer and Game Designer • OBEID EMPIRE, Gamification Company

Selected Open Source Projects

Scorio: Bayesian evaluation and ranking toolkit • GitHub PyPI

Python toolkit for Bayesian evaluation and ranking of stochastic responses

vllm-df11

DFloat11 plugin for vLLM

kvq

Norm-Aware KVQuant

MedViz

Medical visualization tools, INVent Lab, CWRU

Thumbnail-Preserving Encryption

Department of Electrical Engineering and Computer Science, Oregon State University, Prof. Rakesh Bobba

Fesenjoon: Google Drive API management client • GitHub PyPI

Computer skills

Languages

References