Highlights
- Transitioned to academia in 2023 to pursue machine learning research in the Biomedical Engineering Department at Case Western Reserve University, collaborating with Cleveland Clinic and University Hospitals. In 2024, started a new position as an AI Scientist in the Computer and Data Sciences Department at CWRU, where I currently work.
- Completed an M.S. in Computer and Data Sciences at CWRU in 2025, focusing on test-time scaling and efficiency in large language models under the supervision of Prof. Vipin Chaudhary.
- Started a Ph.D. in Computer and Data Sciences at CWRU in 2026 under the supervision of Prof. Vipin Chaudhary.
You are reading a dynamic CV/Résumé! Adjust the slider below to see more or less detail.
Research interests
Machine Learning, Large Language Models, Efficient ML, Quantum Computing
Education
Ph.D. in Computer and Data Sciences • Case Western Reserve University
Under the supervision of Prof. Vipin Chaudhary
M.S. in Computer and Data Sciences (GPA: 4.0/4.0) • Case Western Reserve University
Thesis: Test-Time Scaling Under Budget: Reasoning Evaluation and Memory-Efficient LLM Deployment, Supervisor: Prof. Vipin Chaudhary
B.S. in Electrical Engineering • University of Tehran
Thesis: Evaluating and Analysis on Therapeutic Environment's Network Using Simple Network Management Protocol for Fault Detection and Management, Routing and Auto-discovery, Supervisor: Prof. Reza Aghaizadeh Zoroofi
Publications
M Hariri, M Hinczewski, J Ma, V Chaudhary
Ranking Reasoning LLMs under Test-Time Scaling
Studies how to rank reasoning LLMs when each question is sampled multiple times (test-time scaling). Formalizes the repeated-trial setting, compares ranking families (metrics, Bayesian, IRT, voting, spectral), and introduces Scorio, an open-source toolkit for stable LLM ranking.
M Hariri, A Samandar, M Hinczewski, V Chaudhary
Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation
Proposed a Bayesian framework that estimates models’ success probabilities with quantified uncertainty, yielding more reliable rankings and enabling categorical evaluation of LLMs.
M Hariri, A Luo, W Chen, T Zhang, Q Wang, X Han, V Chaudhary
Quantize What Counts: More for Keys, Less for Values
Keys carry more information than values; consequently, key tensors require a larger quantization bit-width, smaller group sizes, and outlier mitigation (e.g., Hadamard transformation).
M Hariri, V Chaudhary
Geom@k: Fast to Converge, Slow to Drift
Shows that test-time scaling evaluation should separate discovery from repeated correctness, derives Geom@k and the broader GeoSpectrum@K family from a shared hypergeometric view of fixed-budget metrics, and finds that Geom@2 gives the best convergence and rank-correlation summaries across all six aggregate settings.
M Hariri, M Hinczewski, V Chaudhary
Scorio.jl: A Julia package for ranking stochastic responses
A Julia package for evaluating and ranking stochastic systems from repeated responses using a unified tensor-based framework.
A Yu†, M Hariri†, K Nakamura, M Yang, X Li, V Chaudhary (†equal contribution)
Medical Image Spatial Grounding with Semantic Sampling
Evaluates VLM spatial grounding in 3D medical images across modalities and coordinate systems. Introduces MIS-Ground for anatomy-specific failure analysis and MIS-SemSam for improved inference-time grounding without retraining.
B Flannery, T DeSilvio, M Hariri, A Sadri, N Heller, C Weight, S Viswanath
Systematically evaluated how data partitioning, initialization, and augmentation choices affect the robustness and cross-institution generalization of CT-based deep learning survival models.
W Chen, V Singh, Z Rahmani, D Ganguly, M Hariri, S Maxwell, S Gajurel, E Dragowsky, H Djohari, V Chaudhary
K^4-Serve: Robust Streaming Log Anomaly Detection for HPC & AI Infrastructure
Operationalizes K^4 for production log anomaly detection with Kafka-based streaming ingestion, versioned normalization, sliding-window scoring, retraining, and observability, achieving stable deployment on real HPC logs and near-perfect event-level detection with only one false alert.
J Sleiman, G Pillai, P Chirra, N Gandhi, I O Gordon, M Hariri, M E Baker, J Ream, C G Fulmer, S El Ouali, D H Bruining, J A Kurowski, S E Viswanath, F Rieder
Develops and internally validates a CTE radiomics machine-learning model for Crohn's disease strictures, linking three-dimensional texture features to histopathologic inflammation, fibrosis, and smooth muscle hyperplasia/hypertrophy; outperforms radiologist visual scoring and improves characterization when combined with it.
M Hariri, H S Hillsdownley, W Chen, V Chaudhary
Runtime Signals as Correctness Proxies: A Categorical Evaluation Framework for Test-Time Scaling
Argues that binary pass-based metrics are too coarse for test-time scaling and introduces a categorical Bayesian evaluation framework that scores rubric-defined outcomes with uncertainty; shows that a lightweight model using 19 runtime signals reaches 0.917 AUROC without a judge model and that schema choice can materially change model rankings.
D Ganguly, W Chen, M Hariri, V Singh, S Sankar, S Wang, Y Yang, B Zhang, C Song, A Nemecek, K Ye, O Zafar, E Ayday, J Stubbs, S Iyengar, V Chaudhary
Sequent-Prover: Training Agents for Formal, Checkable SMT-based Reasoning
Introduces Sequent-Prover, a training recipe for agents that write SMT-LIB programs, query Z3, repair formalizations, and answer with executable evidence. Distills successful solver-guided traces and uses preference optimization to improve reliability, grounding, proof quality, and efficiency across formal and commonsense reasoning benchmarks.
K Ye, Y Zhang, X Li, C Song, M Hariri, V Singh, D Ganguly, V Chaudhary
Condition-Aware Conformal Prediction for LLM-as-Judge Reliability
Introduces CAJ, a condition-aware conformal calibration method that turns each LLM-judge score into a calibrated prediction set and referral signal. Uses judge, criterion, cross-judge variance, and transitivity features to repair hard-cell coverage gaps on SUMMEVAL and improve selective trust on ROSEACU.
M Hariri, V Chaudhary
Success Has a Shape: TailPass@k for Repeated-Sampling Agent Evaluation
Introduces TailPass@k, a Bayesian discovery-stability profile for repeated-sampling agent evaluation that reports thresholds from at-least-one success to all-k success, recovers accuracy, Pass@k, and stability as special cases, and adds posterior uncertainty for ranking and operating-point comparisons.
C Guo, M Hariri, V Chaudhary
ModalExit: Modality-Aware Expert Exit for Efficient Unified Vision-Language MoE Model Inference
Introduces ModalExit, a training-free inference framework for unified vision-language MoE models that profiles visual-dominant expert slots, detects visual-token saturation, and uses an entropy gate to skip converged image-token expert computation while keeping MMMU accuracy within the full-model range.
V Singh, W Chen, D Ganguly, Y Zhang, N Wang, S Sankar, M Hariri, A Nemecek, C Song, S Wang, B Zhang, V Yang, E Ayday, J Ma, V Chaudhary
CausalGuard: Conformal Inference under Graph Uncertainty
Introduces CausalGuard, a structure-weighted conformal framework for treatment-effect intervals under unknown causal graphs. It combines LLM-derived edge priors, conditional-independence pruning, BIC graph weighting, doubly robust pseudo-outcomes, and aggregate-before-calibrate conformal scoring to maintain finite-sample coverage while reducing overly padded graph-agnostic intervals.
V Singh, D Ganguly, W Chen, S Sahoo, S Sankar, B Zhang, M Hariri, S Wang, O Zafar, C Gagne, V Chaudhary
Reliability-Gated Source Anchoring for Continual Test-Time Adaptation
Introduces RMemSafe, a reliability-gated extension of ROID for continual test-time adaptation that uses frozen-source entropy to attenuate source anchoring and agreement terms when the source collapses, yielding a source-agnostic fallback and lower error across matched continual-corruption benchmarks.
S Zhong, J Zhang, H A D Le, W Xie, Y Lu, X Sun, M Hariri, H Liu, G Wang, Z Xu, Z Liu, S Xu, N Xie, L Li, R Chen, R Tang, X Hu, V Chaudhary
Sweeping Promptable Spoofs under the DirtyRAG: A Practical, Query-Blind RAG Attack Done Right
Introduces DirtyRAG, a query-blind, benign passage-based RAG attack that is robust to defenses and steerable by prompt. Demonstrates practical exploitation and introduces RAG-ATag, a benchmark for evaluating RAG security.
T DeSilvio, L Bao, B N Parker, B Flannery, M Hariri, P Chirra, M Labbad, S Tang, G M O'Connor, R MacBeth, E Steinhagen, J Willis, E L Marderstein, P Fu, A Carroll, M Crittenden, M J Gough, K H Young, S Krishnamurthi, D Liska, A Gupta, A Purysko, S E Viswanath
Integrated Multi-plane, Multi-region Radiomic Features on Baseline T2-Weighted MRI are Associated with Complete Response to Neoadjuvant Therapies in Rectal Cancers
Integrates tumor and proximal-fat radiomic features from axial and coronal baseline T2-weighted MRI to predict complete response to neoadjuvant therapies in rectal cancer; validates the multi-plane, multi-region model across multi-institutional holdout cohorts and shows added value with CEA and clinical T-stage.
L Calandruccio, B Grijalva-Arvizu, P Rodriguez Rivera, M Hariri, E Buss, V Chaudhary
Using automatic speech recognition for scoring speech-in-noise testing
T Zhang, M Hariri, S Zhong, Y Sui, V Chaudhary, X Hu, A Shrivastava
DFloat11 losslessly compresses LLM and diffusion-model weights using dynamic-length floating-point encoding with Huffman coding, shrinking memory by about 30% with no accuracy loss.
W Chen, V Singh, Z Rahmani, D Ganguly, M Hariri, V Chaudhary
K^4: Online Log Anomaly Detection Via Unsupervised Typicality Learning
K4 reframes log anomaly detection as unsupervised typicality learning. It maps log embeddings to four compact PRDC descriptors (Precision, Recall, Density, Coverage) using k-NN statistics, enabling parser-independent online detection with lightweight scoring and microsecond-level latency.
H Liu, S Zhong, X Sun, M Tian, M Hariri, Z Liu, R Tang, Z Jiang, J Yuan, Y Chuang, L Li, S Choi, R Chen, V Chaudhary, X Hu
LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem
A backdoor LoRA can be trained once and then merged with multiple task LoRAs while retaining both capabilities, making it a low-tech attack that is particularly dangerous and infectious.
L Calandruccio, M Hariri, E Buss, V Chaudhary
Masked-speech recognition using human and synthetic cloned speech
Voice-clone vs. human speech study in masked-sentence recognition: intelligibility, perceived human-likeness, and voice-similarity measured with listener judgments and ASR.
A Sridharan, T DeSilvio, B Flannery, M Hariri, R Macbeth, B Parker, A Elumalai, J Devi, A Lovato, C Maneiro, A George, A Ganapath, P Deepak, D H Ballard, S E Viswanath
Introducing an automated pelvic MRI pipeline combining nnU-Net and MedSAM to segment perianal fistulas and anal sphincter muscles in Crohn’s disease, using annotated patient scans to support interventional guidance and surgical planning.
AS Yu, M Hariri, X Zhang, M Yang, V Chaudhary, X Li
Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2
A zero-shot, single-prompt method for 3D knee MRI segmentation was developed using the Segment Anything Model 2 (SAM2). By adapting SAM2 to treat MRI slices as video frames, accurate segmentation was achieved without additional training.
B. Flannery, T. DeSilvio, A. Sadri, M. Hariri, E. Remer, J. Nguyen, S. Viswanath
The Spatial Attention Wavelon Network (SpAWN) is introduced for risk stratification of kidney cancers using CT scans. SpAWN uses pre-training spatial attention and wavelon activation functions to improve model interpretability and generalizability.
P. Chirra, J. Sleiman, N. Gandhi, I. Gordon, M. Hariri, M. Baker, R. Ottichilo, D. Bruining, J. Kurowski, S. Viswanath, F. Rieder
Developed a radiomics-based machine-learning model to characterize inflammation and fibrosis in Crohn’s disease strictures using MRE. The model improved diagnostic accuracy compared to radiologist visual scoring, with combined use enhancing performance.
M. Hariri, P. Chirra, M. Patel, T. T. Einat, I. Dayan, A. Tonetti, Y. Baror, T. Barrett, N. Sushentsev, J. D. Kaggie, S. Yuan, D. Wu, B. Yu, Z. Lyu, C. Hsu, W. Wang, S. Krishnamurthi, S. E. Viswanath
Federated Image Quality Assessment of Prostate MRI Scans in a Multi-institutional Setting
This study addresses the challenge of image artifact impacts on the reliability of machine learning models in medical imaging, exacerbated across multiple institutions.
B. Flannery, M. Hariri, T. DeSilvio, A. Sadri, J. Nguyen, E. M. Remer, S. Krishnamurthi, S. E. Viswanath
A deep learning model was developed to enhance preoperative risk assessment and predict survival in kidney cancer patients through CT scans, aiming to improve treatment decisions and overcome limitations of traditional clinical methods.
L. Bao, T. DeSilvio, B. N. Parker, M. Hariri, P. Chirra, M. Labbad, S. Tang, G. M. O'Connor, E. Steinhagen, J. L. Miller-Ocuin, A. Gupta, E. L. Marderstein, A. Carroll, M. Crittenden, M. J. Gough, S. Krishnamurthi, K. H. Young, S. E. Viswanath
Radiomics from pretreatment MRI were analyzed to predict which rectal cancer patients would respond to neoadjuvant treatments, addressing limitations of traditional staging and biomarker approaches.
M. Rezai, L. Namdari, D. Farsi, N. Ashayeri, M. Naghshbandi, M. Hariri, R. Ghafoury
Investigated the effect of virtual reality on reducing pain in adult patients during laceration repair in emergency departments.
Professional experience
Academic
AI Scientist • Department of Computer and Data Sciences, CWRU
• Supports AI and research computing workflows in the ACCESS ecosystem; develops, deploys, and maintains tools, models, and user-facing research infrastructure for scientific and academic users.
• Member of Ohio-SCIPE (Strengthening the Cyberinfrastructure Professionals Ecosystem).
• Mentor for the annual summer AI Research Experience (AIRE '24, AIRE '25).
• Research Associate, Speech and Auditory Research Lab (SpARLab), CWRU.
• Judge for the CWRU Intersections Poster Symposium.
• Reviewer for ICLR, NeurIPS, ICML, ACL, and COLM.
• Best Pitch Award, 2024 CCIR Symposium, Center for Imaging Research.
Co-instructor and co-organizer for SCIPE Workshop on Large Language Models • CWRU
Developed and taught a workshop on reasoning LLMs, retrieval-augmented generation (RAG), and agentic AI; co-managed curriculum design, instructional materials, and workshop delivery.
Co-instructor and co-organizer for CWRU Workshop on Large Language Models • CWRU
Developed and taught a workshop on foundation models, reasoning in LLMs, and language model evaluation; co-managed curriculum design, instructional materials, and workshop delivery.
Machine Learning Researcher • Department of Biomedical Engineering, CWRU
Co-instructor for Introduction to Database Systems (CSDS 341) • CWRU
Designed the final project. Created a template for the course project and a simple build system to introduce Java package management.
Designer and instructor for Big Data and Cloud Computing workshop • Weatherhead School of Management, CWRU
Designed and developed a Big Data and Cloud Computing workshop; wrote instructional code and challenges, and co-taught it alongside Prof. Chaudhary.
Chief Editor of Biotech Magazine • University of Tehran
Official magazine of the Iranian Society of Biomedical Engineering, student branch, University of Tehran.
Head of Student Branch of Biomedical Engineering • University of Tehran
(UT-BME-SB)
Head of Information Committee of Biomedical Engineering • University of Tehran
(UT-BME-SB)
Industry
Software Developer and Game Designer • OBEID EMPIRE, Gamification Company
Selected Open Source Projects
Scorio: Bayesian evaluation and ranking toolkit • GitHub PyPI
Python toolkit for Bayesian evaluation and ranking of stochastic responses
DFloat11 plugin for vLLM
Norm-Aware KVQuant
Medical visualization tools, INVent Lab, CWRU
Thumbnail-Preserving Encryption
Department of Electrical Engineering and Computer Science, Oregon State University, Prof. Rakesh Bobba
Computer skills
- Operating System
- Linux: Debian-based, Red Hat-based (CentOS)
- Bash, Systemd, Systemctl, Journalctl, CronJobs, iptables (ufw and firewalld), awk, sed, GnuPG
- Virtualization: KVM, QEMU
- AI and Machine Learning
- PyTorch, scikit-learn, Ray
- Google JAX, Numba
- Programming Language
- JavaScript: Streaming, Worker Threads, TCP and UDP implementation with Node.js, Event Handling, WebSocket (Node.js and C++)
- Python: Multiprocessing, Threading, AsyncIO, Compression
- Java: Spring Boot, Spring MVC, Hibernate
- C/C++: WebSocket and PubSub
- CUDA: GPU Programming (LLM inference optimization)
- Go: Web Server and Networking
- C#, Deno, Julia: Familiar
- Database
- NoSQL: MongoDB, InfluxDB, Redis (in-memory)
- SQL: PostgreSQL
- Microservices and Orchestration
- Docker Swarm and Kubernetes
- Pub/Sub: RabbitMQ, Redis
- Software Management
- Scrum
- Web3
- IPFS
- System Administration
- Routing and firewall: iptables, fail2ban
- Email server: Postfix, Dovecot, mail relay configuration
- VPN: OpenVPN, WireGuard
- Server: Nginx (Reverse proxy and load balancing), Certbot
- DNS: Bind9
- DHCP: Open DHCP
- Network
- Tools: tcpdump, Wireshark, nmap, traceroute
- Documentation
- Markdown and OpenAPI (Swagger)
- Electrical Engineering and Signal Processing
- MATLAB and Simulation (Control and Signal toolbox), Verilog, Quartus, Pspice, Multisim
Languages
- Farsi: Native
- English: Fluent
- Turkish: Intermediate
- Arabic: Reading Knowledge
- French: Basic, A2
References
- Vipin Chaudhary Professor, Department of Computer & Data Sciences, CWRU
- Michael Hinczewski Associate Professor, Department of Physics, CWRU
- Satish Viswanath Associate Professor, Department of Biomedical Engineering, Emory University
- Reza Aghaizadeh Zoroofi Professor, School of Electrical & Computer Engineering, University of Tehran