Virtual Agentic Lab!
10-slide paper summary of Swanson et al. (doi:10.1038/s41586-025-09442-9)
10-slide paper summary of Swanson et al. (doi:10.1038/s41586-025-09442-9)
SCIPE Workshop on LLMs - Day 3
SCIPE Workshop on LLMs - Day 2
SCIPE Workshop on LLMs
M.Sc. Thesis in Computer Science
A principled Bayesian framework that replaces Pass@k with posterior estimates, credible intervals, and stable rankings for LLM evaluation