Ranking API
Priors
Scorio.Prior — Type
PriorAbstract supertype for prior penalty specifications used by MAP rankers.
Concrete subtypes define hyperparameters for an internal penalty term over the latent score vector theta.
Scorio.GaussianPrior — Type
GaussianPrior(mean=0.0, var=1.0)Gaussian prior on latent parameters with quadratic penalty.
Arguments
mean::Real=0.0: prior mean.var::Real=1.0: prior variance; must be positive.
Returns
GaussianPrior
Formula
\[\operatorname{penalty}(\theta) = \frac{1}{2\,\mathrm{var}}\sum_i (\theta_i-\mathrm{mean})^2\]
Scorio.LaplacePrior — Type
LaplacePrior(loc=0.0, scale=1.0)Laplace prior on latent parameters with L1 penalty.
Arguments
loc::Real=0.0: location parameter.scale::Real=1.0: scale parameter; must be positive.
Returns
LaplacePrior
Formula
\[\operatorname{penalty}(\theta) = \frac{1}{\mathrm{scale}}\sum_i \left|\theta_i-\mathrm{loc}\right|\]
Scorio.CauchyPrior — Type
CauchyPrior(loc=0.0, scale=1.0)Cauchy prior on latent parameters with log-quadratic penalty.
Arguments
loc::Real=0.0: location parameter.scale::Real=1.0: scale parameter; must be positive.
Returns
CauchyPrior
Formula
Let $z_i = (\theta_i-\mathrm{loc})/\mathrm{scale}$.
\[\operatorname{penalty}(\theta) = \sum_i \log(1 + z_i^2)\]
Scorio.UniformPrior — Type
UniformPrior()Improper flat prior with zero penalty.
Returns
UniformPrior
Formula
\[\operatorname{penalty}(\theta) = 0\]
Scorio.CustomPrior — Type
CustomPrior(penalty_fn)User-defined prior from a callable penalty function.
Arguments
penalty_fn: callable with signaturepenalty_fn(theta)returning a scalar penalty value.
Returns
CustomPrior
Notes
penalty_fn is used directly with no transformation of theta.
Scorio.EmpiricalPrior — Type
EmpiricalPrior(R0; var=1.0, eps=1e-6)Empirical Gaussian-style prior centered at logits inferred from baseline outcomes.
R0 is accepted as shape (L, M) or (L, M, D). A 2D input is promoted to (L, M, 1).
Arguments
R0: baseline outcomes per model. Typically binary outcomes in{0,1}.var::Real=1.0: variance used in the quadratic penalty; must be positive.eps::Real=1e-6: clipping level used before logit transform. No explicit range check is applied; choose0 < eps < 0.5in practice.
Returns
EmpiricalPrior: storesR0,var,eps, and centeredprior_mean.
Formula
For model $l$:
\[a_l = \frac{1}{M D}\sum_{m=1}^{M}\sum_{d=1}^{D} R^0_{lmd}\]
\[\tilde a_l = \operatorname{clip}(a_l, \varepsilon, 1-\varepsilon), \qquad \mu_l = \log\!\left(\frac{\tilde a_l}{1-\tilde a_l}\right)\]
Then mean-center $\mu$ for identifiability and use:
\[\operatorname{penalty}(\theta) = \frac{1}{2\,\mathrm{var}}\sum_{l=1}^{L}(\theta_l-\mu_l)^2\]
Examples
R0 = Int[
1 1 1 0 1
0 1 0 0 1
]
prior = EmpiricalPrior(R0; var=2.0, eps=1e-6)Evaluation-based Ranking
Bayes
Scorio.bayes — Method
bayes(
R::AbstractArray{<:Integer, 3},
w=nothing;
R0=nothing,
quantile=nothing,
method="competition",
return_scores=false,
)Rank models by Bayes@N scores computed independently per model.
If quantile is provided, models are ranked by mu + z_q * sigma; otherwise by posterior mean mu.
References
Hariri, M., Samandar, A., Hinczewski, M., & Chaudhary, V. (2026). Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation. arXiv:2510.04265. https://arxiv.org/abs/2510.04265
Formula
For each model l, let (mu_l, sigma_l) = Scorio.bayes(R_l, w, R0_l).
\[s_l = \begin{cases} \mu_l, & \text{if quantile is not set} \\ \mu_l + \Phi^{-1}(q)\,\sigma_l, & \text{if quantile}=q \in [0,1] \end{cases}\]
Arguments
R: integer tensor(L, M, N)with values in{0, ..., C}.w: class weights of lengthC+1. If not provided and R is binary (contains only 0 and 1), defaults to[0.0, 1.0]. For non-binary R, w is required.R0: optional shared prior(M, D)or model-specific prior(L, M, D).quantile: optional value in[0, 1]for quantile-adjusted ranking.method,return_scores: ranking output controls.
Avg
Scorio.avg — Method
avg(R; method="competition", return_scores=false)Rank models by per-model mean accuracy across all questions and trials.
For each model l, compute the scalar score:
\[s_l^{\mathrm{avg}} = \frac{1}{MN}\sum_{m=1}^{M}\sum_{n=1}^{N} R_{lmn}\]
Higher scores are better; ranking is produced by rank_scores.
Arguments
R: binary response tensor(L, M, N)or matrix(L, M)promoted to(L, M, 1).method: tie-handling rule forrank_scores.return_scores: iftrue, return(ranking, scores).
Pass@k Family
Scorio.pass_at_k — Method
pass_at_k(R::AbstractArray{<:Integer, 3}, k; method="competition", return_scores=false)Rank models by per-model Pass@k scores.
For each model l, define per-question success counts $nu_{lm} = \sum_{n=1}^{N} R_{lmn}$. Then:
\[s_l^{\mathrm{Pass@}k} = \frac{1}{M}\sum_{m=1}^{M} \left(1 - \frac{\binom{N-\nu_{lm}}{k}}{\binom{N}{k}}\right)\]
References
Chen, M., Tworek, J., Jun, H., et al. (2021). Evaluating Large Language Models Trained on Code. arXiv:2107.03374. https://arxiv.org/abs/2107.03374
Scorio.pass_hat_k — Method
pass_hat_k(R::AbstractArray{<:Integer, 3}, k; method="competition", return_scores=false)Rank models by per-model Pass-hat@k (G-Pass@k) scores.
With $nu_{lm} = \sum_{n=1}^{N} R_{lmn}$:
\[s_l^{\widehat{\mathrm{Pass@}k}} = \frac{1}{M}\sum_{m=1}^{M} \frac{\binom{\nu_{lm}}{k}}{\binom{N}{k}}\]
References
Yao, S., Shinn, N., Razavi, P., & Narasimhan, K. (2024). tau-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains. arXiv:2406.12045. https://arxiv.org/abs/2406.12045
Scorio.g_pass_at_k_tau — Method
g_pass_at_k_tau(
R::AbstractArray{<:Integer, 3},
k,
tau;
method="competition",
return_scores=false,)
Rank models by generalized G-Pass@k_τ per model.
Let $X_{lm} ~ Hypergeometric(N, nu_{lm}, k)$ where $nu_{lm} = \sum_{n=1}^{N} R_{lmn}$. The score is:
\[s_l^{\mathrm{G\text{-}Pass@}k_{\tau}} = \frac{1}{M}\sum_{m=1}^{M} \Pr\!\left(X_{lm}\ge \lceil \tau k \rceil\right)\]
\[\Pr(X_{lm}\ge \lceil \tau k \rceil) = \sum_{j=\lceil \tau k \rceil}^{k} \frac{\binom{\nu_{lm}}{j}\binom{N-\nu_{lm}}{k-j}}{\binom{N}{k}}\]
References
Liu, J., Liu, H., Xiao, L., et al. (2024). Are Your LLMs Capable of Stable Reasoning? arXiv:2412.13147. https://arxiv.org/abs/2412.13147
Scorio.mg_pass_at_k — Method
mg_pass_at_k(R::AbstractArray{<:Integer, 3}, k; method="competition", return_scores=false)Rank models by per-model mG-Pass@k scores.
With $X_{lm} ~ Hypergeometric(N, nu_{lm}, k)$ and $m_0 = \lceil k/2 \rceil$:
\[s_l^{\mathrm{mG\text{-}Pass@}k} = \frac{1}{M}\sum_{m=1}^{M} \frac{2}{k}\,\mathbb{E}\!\left[(X_{lm}-m_0)_+\right]\]
Equivalent discrete form:
\[\frac{2}{k}\sum_{i=m_0+1}^{k}\Pr(X_{lm}\ge i)\]
References
Liu, J., Liu, H., Xiao, L., et al. (2024). Are Your LLMs Capable of Stable Reasoning? arXiv:2412.13147. https://arxiv.org/abs/2412.13147
Pointwise Methods
Scorio.inverse_difficulty — Function
inverse_difficulty(
R;
method="competition",
return_scores=false,
clip_range=(0.01, 0.99),
)Rank models by question accuracy weighted by inverse empirical question difficulty.
Each question weight is proportional to 1 / p_correct(question), after clipping p_correct to clip_range and normalizing weights to sum to 1.
Let $k_{lm} = \sum_{n=1}^{N} R_{lmn}$ and $\hat p_{lm} = k_{lm}/N$. Define the global per-question solve rate $\bar p_m = \frac{1}{L}\sum_l \hat p_{lm}$ and weights:
\[w_m \propto \frac{1}{\operatorname{clip}(\bar p_m, a, b)}, \qquad \sum_{m=1}^{M} w_m = 1\]
The model score is:
\[s_l^{\mathrm{inv\text{-}diff}} = \sum_{m=1}^{M} w_m \hat p_{lm}\]
Reference
Inverse probability weighting: https://en.wikipedia.org/wiki/Inverseprobabilityweighting
Pairwise Methods
Scorio.elo — Function
elo(
R;
K=32.0,
initial_rating=1500.0,
tie_handling="correct_draw_only",
method="competition",
return_scores=false,
)Sequential Elo rating over pairwise outcomes induced by R.
For each (question, trial), all model pairs are compared and Elo updates are applied in fixed iteration order. Pair outcomes are:
- decisive (
1vs0): win/loss update - tie (
1vs1or0vs0): handled bytie_handling
Arguments
R: binary response tensor of shape(L, M, N)or matrix(L, M)promoted to(L, M, 1).K: positive Elo step size.initial_rating: finite initial rating for all models.tie_handling: one of"skip","draw","correct_draw_only".method: rank tie-handling method passed torank_scores.return_scores: iftrue, return(ranking, ratings).
Returns
rankingby default.(ranking, ratings)whenreturn_scores=true.
Formula
For each induced pairwise match (i,j) with observed score $S_{ij} \in \{0, 0.5, 1\}$:
\[E_{ij} = \frac{1}{1 + 10^{(r_j-r_i)/400}}\]
\[r_i \leftarrow r_i + K(S_{ij} - E_{ij}), \quad r_j \leftarrow r_j + K((1-S_{ij}) - (1-E_{ij}))\]
Reference
Elo, A. E. (1978). The Rating of Chessplayers, Past and Present.
Scorio.trueskill — Function
trueskill(
R;
mu_initial=25.0,
sigma_initial=25.0 / 3,
beta=25.0 / 6,
tau=25.0 / 300,
method="competition",
return_scores=false,
tie_handling="skip",
draw_margin=0.0,
)Rank models with a sequential two-player TrueSkill-style update over induced pairwise comparisons.
Returns rankings from posterior means mu.
Formula
For one match between models i and j:
\[c = \sqrt{2\beta^2 + \sigma_i^2 + \sigma_j^2}, \quad t = (\mu_i-\mu_j)/c, \quad \epsilon = \text{draw\_margin}/c\]
For decisive outcomes, the update uses $v_{win}(t,\epsilon)$ and $w_{win}(t,\epsilon)$:
\[\mu_i' = \mu_i + \frac{\sigma_i^2}{c} v_{win}(t,\epsilon), \quad \sigma_i'^2 = \sigma_i^2\!\left(1 - \frac{\sigma_i^2}{c^2}w_{win}(t,\epsilon)\right)\]
Draw updates use the analogous $v_{draw}$ and $w_{draw}$ corrections.
Reference
Herbrich, R., Minka, T., & Graepel, T. (2006). TrueSkill(TM): A Bayesian Skill Rating System. NeurIPS 19.
Scorio.glicko — Function
glicko(
R;
initial_rating=1500.0,
initial_rd=350.0,
c=0.0,
rd_max=350.0,
tie_handling="correct_draw_only",
return_deviation=false,
method="competition",
return_scores=false,
)Rank models with sequential Glicko updates over induced pairwise comparisons.
If return_deviation=true, returns (ranking, rating, rd); otherwise returns ranking or (ranking, rating) when return_scores=true.
Formula
Let $q = \ln(10)/400$ and $g(RD) = 1/\sqrt{1 + 3q^2 RD^2/\pi^2}$. For model i in one period:
\[E_{ij} = \frac{1}{1 + 10^{-g(RD_j)(r_i-r_j)/400}}\]
\[d_i^2 = \left(q^2\sum_j g(RD_j)^2 E_{ij}(1-E_{ij})\right)^{-1}\]
\[RD_i' = \left(\frac{1}{RD_i^2} + \frac{1}{d_i^2}\right)^{-1/2}, \quad r_i' = r_i + \frac{q}{\frac{1}{RD_i^2}+\frac{1}{d_i^2}} \sum_j g(RD_j)(S_{ij}-E_{ij})\]
References
Glickman, M. E. (1999). Parameter Estimation in Large Dynamic Paired Comparison Experiments. JRSS C, 48(3), 377-394. https://doi.org/10.1111/1467-9876.00159
Paired-Comparison Probabilistic Models
Scorio.bradley_terry — Function
bradley_terry(R; method="competition", return_scores=false, max_iter=500)Rank models with Bradley-Terry maximum likelihood on decisive pairwise wins.
Let $W_{ij}$ be decisive wins of model i over j and strengths pi_i > 0.
\[\Pr(i \succ j) = \frac{\pi_i}{\pi_i + \pi_j}\]
\[\log p(W\mid \pi) = \sum_{i\ne j} W_{ij}\left[\log \pi_i - \log(\pi_i+\pi_j)\right]\]
References
Bradley, R. A., & Terry, M. E. (1952). Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika. https://doi.org/10.1093/biomet/39.3-4.324
Scorio.bradley_terry_map — Function
bradley_terry_map(
R;
prior=1.0,
method="competition",
return_scores=false,
max_iter=500,
)Rank models with Bradley-Terry MAP estimation using the given prior on centered log-strengths.
With theta_i = log(pi_i):
\[\hat\theta = \arg\min_{\theta} \left[-\log p(W\mid \theta) + \operatorname{penalty}(\theta)\right]\]
\[\hat\pi_i = \exp(\hat\theta_i)\]
Reference
Caron, F., & Doucet, A. (2012). Efficient Bayesian inference for generalized Bradley-Terry models. https://doi.org/10.1080/10618600.2012.638220
Scorio.bradley_terry_davidson — Function
bradley_terry_davidson(R; method="competition", return_scores=false, max_iter=500)Rank models with Bradley-Terry-Davidson ML, incorporating explicit tie mass.
The Davidson tie extension introduces nu > 0:
\[\Pr(i\succ j) = \frac{\pi_i}{\pi_i+\pi_j+\nu\sqrt{\pi_i\pi_j}}, \quad \Pr(i\sim j) = \frac{\nu\sqrt{\pi_i\pi_j}}{\pi_i+\pi_j+\nu\sqrt{\pi_i\pi_j}}\]
Reference
Davidson, R. R. (1970). On extending the Bradley-Terry model to accommodate ties in paired comparison experiments. https://doi.org/10.1080/01621459.1970.10481082
Scorio.bradley_terry_davidson_map — Function
bradley_terry_davidson_map(
R;
prior=1.0,
method="competition",
return_scores=false,
max_iter=500,)
Rank models with Bradley-Terry-Davidson MAP estimation.
\[(\hat\theta,\hat\nu) = \arg\min_{\theta,\nu>0} \left[-\log p(W,T\mid \theta,\nu) + \operatorname{penalty}(\theta)\right]\]
Scorio.rao_kupper — Function
rao_kupper(
R;
tie_strength=1.1,
method="competition",
return_scores=false,
max_iter=500,)
Rank models with the Rao-Kupper tie model (ML).
With fixed $kappa \ge 1$:
\[\Pr(i\succ j)=\frac{\pi_i}{\pi_i+\kappa\pi_j}, \quad \Pr(j\succ i)=\frac{\pi_j}{\kappa\pi_i+\pi_j}\]
\[\Pr(i\sim j)= \frac{(\kappa^2-1)\pi_i\pi_j} {(\pi_i+\kappa\pi_j)(\kappa\pi_i+\pi_j)}\]
Reference
Rao, P. V., & Kupper, L. L. (1967). Ties in paired-comparison experiments: A generalization of the Bradley-Terry model. https://doi.org/10.1080/01621459.1967.10482901
Scorio.rao_kupper_map — Function
rao_kupper_map(
R;
tie_strength=1.1,
prior=1.0,
method="competition",
return_scores=false,
max_iter=500,)
Rank models with the Rao-Kupper tie model under MAP estimation.
\[\hat\theta = \arg\min_{\theta} \left[-\log p(W,T\mid \theta,\kappa) + \operatorname{penalty}(\theta)\right]\]
Bayesian Ranking
Scorio.thompson — Function
thompson(
R;
n_samples=10000,
prior_alpha=1.0,
prior_beta=1.0,
seed=42,
method="competition",
return_scores=false,
)Rank models by Thompson sampling over Beta posteriors of model success rates.
The returned score for each model is negative average sampled rank (higher is better).
Let $S_l = \sum_{m,n} R_{lmn}$ and T = M N. Posterior per model:
\[p_l \mid R \sim \mathrm{Beta}(\alpha + S_l,\ \beta + T - S_l)\]
With posterior draws $t=1,\dots,T_s$ and sampled rank $r_l^{(t)}$:
\[s_l^{\mathrm{TS}} = -\frac{1}{T_s}\sum_{t=1}^{T_s} r_l^{(t)}\]
References
Thompson, W. R. (1933). On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika. https://doi.org/10.1093/biomet/25.3-4.285
Russo, D. J., et al. (2018). A Tutorial on Thompson Sampling. https://doi.org/10.1561/2200000070
Scorio.bayesian_mcmc — Function
bayesian_mcmc(
R;
n_samples=5000,
burnin=1000,
prior_var=1.0,
seed=42,
method="competition",
return_scores=false,
)Rank models via random-walk Metropolis MCMC under a Bradley-Terry-style pairwise likelihood with Gaussian prior on latent abilities.
Scores are posterior means of sampled latent abilities.
Let $W_{ij}$ be decisive wins of model i over j, and latent log-strengths theta_i with Gaussian prior variance sigma^2 = prior_var.
\[\Pr(i \succ j \mid \theta) = \frac{\exp(\theta_i)}{\exp(\theta_i)+\exp(\theta_j)}, \qquad \theta_i \sim \mathcal{N}(0,\sigma^2)\]
The returned score is the posterior mean:
\[s_i^{\mathrm{MCMC}} = \mathbb{E}[\theta_i \mid W]\]
References
Bradley, R. A., & Terry, M. E. (1952). Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika. https://doi.org/10.1093/biomet/39.3-4.324
Metropolis, N., et al. (1953). Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics. https://doi.org/10.1063/1.1699114
Item Response Theory
Scorio.rasch — Function
rasch(
R;
method="competition",
return_scores=false,
max_iter=500,
return_item_params=false,
)Rank models with Rasch (1PL) maximum-likelihood estimation.
Returns rankings from estimated abilities theta. When return_item_params=true, also returns item difficulties.
For counts $k_{lm}=\sum_n R_{lmn}$:
\[k_{lm} \sim \mathrm{Binomial}\!\left(N,\sigma(\theta_l-b_m)\right)\]
Item difficulties are mean-centered for identifiability:
\[b \leftarrow b - \frac{1}{M}\sum_m b_m\]
Reference
Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests.
Scorio.rasch_map — Function
rasch_map(
R;
prior=1.0,
method="competition",
return_scores=false,
max_iter=500,
return_item_params=false,
)Rank models with Rasch (1PL) MAP estimation using an ability prior.
\[(\hat\theta,\hat b) = \arg\min_{\theta,b} \left[ -\sum_{l,m}\log p(k_{lm}\mid \theta_l,b_m) + \operatorname{penalty}(\theta) \right]\]
Reference
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika.
Scorio.rasch_2pl — Function
rasch_2pl(
R;
method="competition",
return_scores=false,
max_iter=500,
return_item_params=false,
reg_discrimination=0.01,
)Rank models with 2PL IRT maximum likelihood (ability + item discrimination).
\[k_{lm} \sim \mathrm{Binomial}\!\left( N,\sigma\!\left(a_m(\theta_l-b_m)\right)\right)\]
Scorio.rasch_2pl_map — Function
rasch_2pl_map(
R;
prior=1.0,
method="competition",
return_scores=false,
max_iter=500,
return_item_params=false,
reg_discrimination=0.01,
)Rank models with 2PL IRT MAP estimation.
Same 2PL likelihood as rasch_2pl, plus prior regularization on abilities:
\[\hat\theta \in \arg\min_{\theta,\cdots} \left[-\log p(k\mid \theta,\cdots)+\operatorname{penalty}(\theta)\right]\]
Scorio.rasch_3pl — Function
rasch_3pl(
R;
method="competition",
return_scores=false,
max_iter=500,
fix_guessing=nothing,
return_item_params=false,
reg_discrimination=0.01,
reg_guessing=0.1,
guessing_upper=0.5,
)Rank models with 3PL IRT maximum likelihood (ability, discrimination, guessing).
\[p_{lm} = c_m + (1-c_m)\sigma\!\left(a_m(\theta_l-b_m)\right)\]
with $c_m \in [0, \text{guessing_upper}]$.
Scorio.rasch_3pl_map — Function
rasch_3pl_map(
R;
prior=1.0,
method="competition",
return_scores=false,
max_iter=500,
fix_guessing=nothing,
return_item_params=false,
reg_discrimination=0.01,
reg_guessing=0.1,
guessing_upper=0.5,
)Rank models with 3PL IRT MAP estimation.
Same 3PL likelihood as rasch_3pl, with prior penalty on abilities:
\[\hat\theta \in \arg\min_{\theta,\cdots} \left[-\log p(k\mid \theta,\cdots)+\operatorname{penalty}(\theta)\right]\]
Scorio.dynamic_irt — Function
dynamic_irt(
R;
variant="linear",
method="competition",
return_scores=false,
max_iter=500,
return_item_params=false,
time_points=nothing,
score_target="final",
slope_reg=0.01,
state_reg=1.0,
assume_time_axis=false,
)Rank models with dynamic IRT variants:
"linear": static Rasch baseline"growth": linear growth path"state_space": smoothed latent trajectory
Growth variant:
\[\theta_{ln} = \theta_{0,l} + \theta_{1,l} t_n,\qquad P(R_{lmn}=1)=\sigma(\theta_{ln}-b_m)\]
State-space variant:
\[P(R_{lmn}=1)=\sigma(\theta_{ln}-b_m)\]
with smoothness penalty
\[\lambda \sum_{l,n>1} \frac{(\theta_{ln}-\theta_{l,n-1})^2}{t_n-t_{n-1}}\]
References
Verhelst, N. D., & Glas, C. A. (1993). A dynamic generalization of the Rasch model. Psychometrika.
Scorio.rasch_mml — Function
rasch_mml(
R;
method="competition",
return_scores=false,
max_iter=100,
em_iter=20,
n_quadrature=21,
return_item_params=false,
)Rank models with Rasch marginal maximum likelihood using EM + quadrature.
Using quadrature nodes $\theta_q$ and weights w_q, posterior mass for model l is:
\[w_{lq} \propto p(k_l\mid \theta_q,b)\,w_q\]
EAP ability score:
\[\hat\theta_l^{\mathrm{EAP}} = \sum_q w_{lq}\theta_q\]
References
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika.
Scorio.rasch_mml_credible — Function
rasch_mml_credible(
R;
quantile=0.05,
method="competition",
return_scores=false,
max_iter=100,
em_iter=20,
n_quadrature=21,
)Rank models by posterior ability quantiles from Rasch MML posterior mass.
\[s_l = Q_q(\theta_l \mid R)\]
Lower q (for example 0.05) yields a more conservative ranking.
Voting Methods
Scorio.borda — Function
borda(R; method="competition", return_scores=false)Rank models with Borda count from per-question model orderings.
Let $k_{lm} = \sum_{n=1}^{N} R_{lmn}$ and $r_{lm}$ be the descending tie-averaged rank of model l on question m (rank 1 is best):
\[s_l^{\mathrm{Borda}} = \sum_{m=1}^{M} (L - r_{lm})\]
Reference
de Borda, J.-C. (1781/1784). Mémoire sur les élections au scrutin.
Scorio.copeland — Function
copeland(R; method="competition", return_scores=false)Rank models by Copeland score over pairwise question-level majorities.
Let $W^{(q)}_{ij}$ be the number of questions where $k_{im} > k_{jm}$:
\[s_i^{\mathrm{Copeland}} = \sum_{j\ne i}\operatorname{sign}\!\left(W^{(q)}_{ij} - W^{(q)}_{ji}\right)\]
Scorio.win_rate — Function
win_rate(R; method="competition", return_scores=false)Rank models by aggregate pairwise win rate.
With the same $W^{(q)}_{ij}$ counts:
\[s_i^{\mathrm{winrate}} = \frac{\sum_{j\ne i} W^{(q)}_{ij}} {\sum_{j\ne i}\left(W^{(q)}_{ij}+W^{(q)}_{ji}\right)}\]
Models with no decisive pairwise outcomes receive score 0.5.
Scorio.minimax — Function
minimax(
R;
variant="margin",
tie_policy="half",
method="competition",
return_scores=false,
)Rank models with Minimax (Simpson-Kramer), using worst defeat strength.
Let $P_{ij}$ be pairwise preference counts and $\Delta_{ij}=P_{ij}-P_{ji}$.
Margin variant:
\[s_i^{\mathrm{minimax}} = -\max_{j\ne i}\max(0,\Delta_{ji})\]
Winning-votes variant:
\[s_i^{\mathrm{wv}} = -\max_{j:\,P_{ji}>P_{ij}} P_{ji}\]
Scorio.schulze — Function
schulze(R; tie_policy="half", method="competition", return_scores=false)Rank models using the Schulze strongest-path method.
Initialize:
\[p_{ij} = \begin{cases} P_{ij}, & P_{ij}>P_{ji} \\ 0, & \text{otherwise} \end{cases}\]
Then apply strongest-path closure:
\[p_{jk} \leftarrow \max\!\left(p_{jk}, \min(p_{ji}, p_{ik})\right)\]
Model i beats j if $p_{ij} > p_{ji}$.
Scorio.ranked_pairs — Function
ranked_pairs(
R;
strength="margin",
tie_policy="half",
method="competition",
return_scores=false,
)Rank models with Ranked Pairs (Tideman) by locking pairwise victories without creating directed cycles.
Victories are sorted by strength (margin or winning-votes), then each edge winner -> loser is locked only if it does not create a directed cycle in the current locked graph.
Scorio.kemeny_young — Function
kemeny_young(
R;
tie_policy="half",
method="competition",
return_scores=false,
time_limit=nothing,
tie_aware=true,
)Rank models with Kemeny-Young via MILP optimization. With tie_aware=true, the routine analyzes forced pairwise orders among optimal solutions.
Binary variables $y_{ij}$ indicate whether model i is above j:
\[\max_y \sum_{i\ne j} P_{ij} y_{ij}\]
subject to:
\[y_{ij}+y_{ji}=1,\qquad y_{ij}+y_{jk}+y_{ki}\le 2 \quad (\forall i,j,k\ \text{distinct})\]
tie_aware=true checks which pairwise orders are forced across all optimal solutions and ranks by that forced-order DAG.
Scorio.nanson — Function
nanson(R; rank_ties="average", method="competition", return_scores=false)Rank models with Nanson's elimination rule (iterative Borda with below-mean elimination).
At round t, with active set A_t and Borda scores $s_i^{(t)}$:
\[E_t = \{ i\in A_t : s_i^{(t)} < \overline{s}^{(t)} \}, \qquad A_{t+1} = A_t \setminus E_t\]
Scorio.baldwin — Function
baldwin(R; rank_ties="average", method="competition", return_scores=false)Rank models with Baldwin's elimination rule (iterative elimination of minimum Borda score).
At round t:
\[E_t = \arg\min_{i\in A_t} s_i^{(t)}, \qquad A_{t+1} = A_t \setminus E_t\]
This implementation removes all models tied at the minimum in a round.
Scorio.majority_judgment — Function
majority_judgment(R; method="competition", return_scores=false)Rank models using Majority Judgment with recursive median-grade tie-breaking.
Each question assigns grade $k_{lm}\in\{0,\dots,N\}$. Models are compared by lower median grade; ties are broken by recursively removing one occurrence of the current median grade from tied models and repeating the comparison.
Reference
Balinski, M., & Laraki, R. (2011). Majority Judgment.
Graph-based Methods
Scorio.pagerank — Function
pagerank(
R;
damping=0.85,
max_iter=100,
tol=1e-6,
method="competition",
return_scores=false,
teleport=nothing,
)Rank models with PageRank on the pairwise win-probability graph.
Let $\hat P_{i\succ j}$ be empirical tied-split win probabilities. Column-normalized transition matrix:
\[P_{ij} = \frac{\hat P_{i\succ j}}{\sum_{k\ne j}\hat P_{k\succ j}}\]
PageRank fixed point:
\[r = d P r + (1-d)e\]
where e is a teleportation distribution (uniform by default).
Reference
Page, L., et al. (1999). The PageRank Citation Ranking.
Scorio.spectral — Function
spectral(
R;
max_iter=10000,
tol=1e-12,
method="competition",
return_scores=false,
)Rank models by the dominant eigenvector of a spectral centrality matrix built from pairwise win probabilities.
Construct:
\[W_{ij}=\hat P_{i\succ j}\ (i\ne j), \qquad W_{ii}=\sum_{j\ne i}W_{ij}\]
Score vector is the normalized dominant right eigenvector:
\[v \propto Wv,\qquad \sum_i v_i=1\]
Scorio.alpharank — Function
alpharank(
R;
alpha=1.0,
population_size=50,
max_iter=100000,
tol=1e-12,
method="competition",
return_scores=false,
)Rank models with single-population alpha-Rank stationary distribution scores.
For resident s, mutant r, population size m:
\[u = \alpha\frac{m}{m-1}\left(\hat P_{r\succ s}-\frac12\right),\qquad \rho_{r,s}= \begin{cases} \frac{1-e^{-u}}{1-e^{-mu}}, & u\ne 0\\ \frac{1}{m}, & u=0 \end{cases}\]
Transition matrix:
\[C_{sr}=\frac{1}{L-1}\rho_{r,s},\qquad C_{ss}=1-\sum_{r\ne s}C_{sr}\]
Ranking uses the stationary distribution of C.
Reference
Omidshafiei, S., et al. (2019). α-Rank: Multi-Agent Evaluation by Evolution. Scientific Reports.
Scorio.nash — Function
nash(
R;
n_iter=100,
temperature=0.1,
solver="lp",
score_type="vs_equilibrium",
return_equilibrium=false,
method="competition",
return_scores=false,
)Rank models from a Nash-equilibrium mixture of the zero-sum meta-game induced by pairwise win probabilities.
Payoff matrix:
\[A_{ij}=2\hat P_{i\succ j}-1,\qquad A_{ii}=0\]
Equilibrium mixture x is found by LP:
\[\max_{x\in\Delta^{L-1}} v \quad\text{s.t.}\quad \sum_i A_{ij}x_i \ge v,\ \forall j\]
Default score type ("vs_equilibrium") is:
\[s_i = \sum_j \hat P_{i\succ j} x_j\]
Centrality and Spectral Variants
Scorio.rank_centrality — Function
rank_centrality(
R;
method="competition",
return_scores=false,
tie_handling="half",
smoothing=0.0,
teleport=0.0,
max_iter=10000,
tol=1e-12,
)Rank models with Rank Centrality using stationary distribution of a pairwise-comparison Markov chain.
Let d_max be the maximum degree of the undirected comparison graph and $\hat P_{j\succ i}$ the empirical probability that j beats i. For $i \ne j$:
\[P_{ij} = \frac{1}{d_{\max}}\,\hat P_{j\succ i}, \qquad P_{ii} = 1 - \sum_{j\ne i} P_{ij}\]
Scores are stationary probabilities $\pi$ with:
\[\pi^\top P = \pi^\top,\qquad \sum_i \pi_i = 1\]
Reference
Negahban, S., Oh, S., & Shah, D. (2017). Rank Centrality: Ranking from Pairwise Comparisons. Operations Research.
Scorio.serial_rank — Function
serial_rank(R; comparison="prob_diff", method="competition", return_scores=false)Rank models with SerialRank spectral seriation using a Fiedler-vector ordering from comparison-induced similarity.
With pairwise comparison matrix C (skew-symmetric), SerialRank builds:
\[S = \frac{1}{2}\left(L\mathbf{1}\mathbf{1}^{\top} + C C^{\top}\right), \qquad L_S = \operatorname{diag}(S\mathbf{1}) - S\]
Scores are the oriented Fiedler vector of L_S (eigenvector of the second-smallest eigenvalue), with sign chosen to best match observed pairwise directions.
Reference
Fogel, F., d'Aspremont, A., & Vojnovic, M. (2016). Spectral Ranking Using Seriation. JMLR.
Scorio.hodge_rank — Function
hodge_rank(
R;
pairwise_stat="binary",
weight_method="total",
epsilon=0.5,
method="competition",
return_scores=false,
return_diagnostics=false,
)Rank models with l2 HodgeRank on a weighted pairwise-comparison graph.
Let $Y_{ij}$ be a skew-symmetric observed pairwise flow and $w_{ij}\ge 0$ edge weights. HodgeRank solves:
\[s^\star \in \arg\min_s \sum_{i<j} w_{ij}\left((s_j-s_i)-Y_{ij}\right)^2\]
Equivalent normal equations:
\[\Delta_0 s^\star = -\operatorname{div}(Y), \qquad s^\star = -\Delta_0^\dagger \operatorname{div}(Y)\]
where $\Delta_0^\dagger$ is the Laplacian pseudoinverse.
Reference
Jiang, X., Lim, L.-H., Yao, Y., & Ye, Y. (2009). Statistical Ranking and Combinatorial Hodge Theory. https://arxiv.org/abs/0811.1067
Listwise and Choice Models
Scorio.plackett_luce — Function
plackett_luce(
R;
method="competition",
return_scores=false,
max_iter=500,
tol=1e-8,
)Rank models with Plackett-Luce ML on decisive pairwise-reduced outcomes.
This implementation uses pairwise decisive counts and Hunter's MM update:
\[\pi_i^{(k+1)} = \frac{\sum_j W_{ij}} {\sum_{j\ne i}(W_{ij}+W_{ji})/(\pi_i^{(k)}+\pi_j^{(k)})}\]
followed by normalization of $\pi$.
References
Plackett, R. L. (1975). The Analysis of Permutations. Hunter, D. R. (2004). MM algorithms for generalized Bradley-Terry models.
Scorio.plackett_luce_map — Function
plackett_luce_map(
R;
prior=1.0,
method="competition",
return_scores=false,
max_iter=500,
)Rank models with Plackett-Luce MAP using a prior on centered log-strengths.
With $theta_i = \log \pi_i$:
\[\hat\theta \in \arg\min_\theta \left[ -\sum_{i\ne j}W_{ij}\left(\theta_i-\log(e^{\theta_i}+e^{\theta_j})\right) + \operatorname{penalty}(\theta-\bar\theta) \right]\]
Scorio.davidson_luce — Function
davidson_luce(
R;
method="competition",
return_scores=false,
max_iter=500,
max_tie_order=nothing,)
Rank models with Davidson-Luce setwise tie likelihood (ML).
For event comparison set $S=W\cup L$, winner set size t=|W|, $g_t(W)=\left(\prod_{i\in W}\alpha_i\right)^{1/t}$, and tie-order parameters delta_t:
\[\Pr(W\mid S)= \frac{\delta_t g_t(W)} {\sum_{t'=1}^{\min(D,|S|)}\delta_{t'} \sum_{|T|=t'} g_{t'}(T)}\]
Reference
Firth, D., Kosmidis, I., & Turner, H. L. (2019). Davidson-Luce model for multi-item choice with ties.
Scorio.davidson_luce_map — Function
davidson_luce_map(
R;
prior=1.0,
method="competition",
return_scores=false,
max_iter=500,
max_tie_order=nothing,)
Rank models with Davidson-Luce MAP estimation.
\[\hat\theta \in \arg\min_\theta \left[-\log p(\text{events}\mid\theta,\delta) + \operatorname{penalty}(\theta)\right]\]
Scorio.bradley_terry_luce — Function
bradley_terry_luce(
R;
method="competition",
return_scores=false,
max_iter=500,
)Rank models with Bradley-Terry-Luce composite-likelihood ML from setwise winner/loser events.
For each event (W,L), each winner $i\in W$ is treated as a Luce choice from ${i}\cup L$, yielding composite log-likelihood:
\[\ell_{\mathrm{comp}}(\pi) = \sum_{(W,L)}\sum_{i\in W} \left[ \log\pi_i - \log\!\left(\pi_i+\sum_{j\in L}\pi_j\right) \right]\]
Scorio.bradley_terry_luce_map — Function
bradley_terry_luce_map(
R;
prior=1.0,
method="competition",
return_scores=false,
max_iter=500,)
Rank models with Bradley-Terry-Luce composite-likelihood MAP estimation.
\[\hat\theta \in \arg\min_\theta \left[-\ell_{\mathrm{comp}}(\theta)+\operatorname{penalty}(\theta)\right]\]