Ranking API

Evaluation-based Ranking

Bayes

Scorio.bayes — Method

bayes(
    R::AbstractArray{<:Integer, 3},
    w=nothing;
    R0=nothing,
    quantile=nothing,
    method="competition",
    return_scores=false,
)

Rank models by Bayes@N scores computed independently per model.

If quantile is provided, models are ranked by mu + z_q * sigma; otherwise by posterior mean mu.

References

Hariri, M., Samandar, A., Hinczewski, M., & Chaudhary, V. (2026). Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation. arXiv:2510.04265. https://arxiv.org/abs/2510.04265

Formula

For each model l, let (mu_l, sigma_l) = Scorio.bayes(R_l, w, R0_l).

\[s_l = \begin{cases} \mu_l, & \text{if quantile is not set} \\ \mu_l + \Phi^{-1}(q)\,\sigma_l, & \text{if quantile}=q \in [0,1] \end{cases}\]

Arguments

R: integer tensor (L, M, N) with values in {0, ..., C}.
w: class weights of length C+1. If not provided and R is binary (contains only 0 and 1), defaults to [0.0, 1.0]. For non-binary R, w is required.
R0: optional shared prior (M, D) or model-specific prior (L, M, D).
quantile: optional value in [0, 1] for quantile-adjusted ranking.
method, return_scores: ranking output controls.

source

Avg

Scorio.avg — Method

avg(R; method="competition", return_scores=false)

Rank models by per-model mean accuracy across all questions and trials.

For each model l, compute the scalar score:

\[s_l^{\mathrm{avg}} = \frac{1}{MN}\sum_{m=1}^{M}\sum_{n=1}^{N} R_{lmn}\]

Higher scores are better; ranking is produced by rank_scores.

Arguments

R: binary response tensor (L, M, N) or matrix (L, M) promoted to (L, M, 1).
method: tie-handling rule for rank_scores.
return_scores: if true, return (ranking, scores).

source

Pass@k Family

Scorio.pass_at_k — Method

pass_at_k(R::AbstractArray{<:Integer, 3}, k; method="competition", return_scores=false)

Rank models by per-model Pass@k scores.

For each model l, define per-question success counts $nu_{lm} = \sum_{n=1}^{N} R_{lmn}$. Then:

\[s_l^{\mathrm{Pass@}k} = \frac{1}{M}\sum_{m=1}^{M} \left(1 - \frac{\binom{N-\nu_{lm}}{k}}{\binom{N}{k}}\right)\]

References

Chen, M., Tworek, J., Jun, H., et al. (2021). Evaluating Large Language Models Trained on Code. arXiv:2107.03374. https://arxiv.org/abs/2107.03374

source

Scorio.pass_hat_k — Method

pass_hat_k(R::AbstractArray{<:Integer, 3}, k; method="competition", return_scores=false)

Rank models by per-model Pass-hat@k (G-Pass@k) scores.

With $nu_{lm} = \sum_{n=1}^{N} R_{lmn}$:

\[s_l^{\widehat{\mathrm{Pass@}k}} = \frac{1}{M}\sum_{m=1}^{M} \frac{\binom{\nu_{lm}}{k}}{\binom{N}{k}}\]

References

Yao, S., Shinn, N., Razavi, P., & Narasimhan, K. (2024). tau-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains. arXiv:2406.12045. https://arxiv.org/abs/2406.12045

source

Scorio.g_pass_at_k_tau — Method

g_pass_at_k_tau(
    R::AbstractArray{<:Integer, 3},
    k,
    tau;
    method="competition",
    return_scores=false,

)

Rank models by generalized G-Pass@k_τ per model.

Let $X_{lm} ~ Hypergeometric(N, nu_{lm}, k)$ where $nu_{lm} = \sum_{n=1}^{N} R_{lmn}$. The score is:

\[s_l^{\mathrm{G\text{-}Pass@}k_{\tau}} = \frac{1}{M}\sum_{m=1}^{M} \Pr\!\left(X_{lm}\ge \lceil \tau k \rceil\right)\]

\[\Pr(X_{lm}\ge \lceil \tau k \rceil) = \sum_{j=\lceil \tau k \rceil}^{k} \frac{\binom{\nu_{lm}}{j}\binom{N-\nu_{lm}}{k-j}}{\binom{N}{k}}\]

References

Liu, J., Liu, H., Xiao, L., et al. (2024). Are Your LLMs Capable of Stable Reasoning? arXiv:2412.13147. https://arxiv.org/abs/2412.13147

source

Scorio.mg_pass_at_k — Method

mg_pass_at_k(R::AbstractArray{<:Integer, 3}, k; method="competition", return_scores=false)

Rank models by per-model mG-Pass@k scores.

With $X_{lm} ~ Hypergeometric(N, nu_{lm}, k)$ and $m_0 = \lceil k/2 \rceil$:

\[s_l^{\mathrm{mG\text{-}Pass@}k} = \frac{1}{M}\sum_{m=1}^{M} \frac{2}{k}\,\mathbb{E}\!\left[(X_{lm}-m_0)_+\right]\]

Equivalent discrete form:

\[\frac{2}{k}\sum_{i=m_0+1}^{k}\Pr(X_{lm}\ge i)\]

References

Liu, J., Liu, H., Xiao, L., et al. (2024). Are Your LLMs Capable of Stable Reasoning? arXiv:2412.13147. https://arxiv.org/abs/2412.13147

source

Pointwise Methods

Scorio.inverse_difficulty — Function

inverse_difficulty(
    R;
    method="competition",
    return_scores=false,
    clip_range=(0.01, 0.99),
)

Rank models by question accuracy weighted by inverse empirical question difficulty.

Each question weight is proportional to 1 / p_correct(question), after clipping p_correct to clip_range and normalizing weights to sum to 1.

Let $k_{lm} = \sum_{n=1}^{N} R_{lmn}$ and $\hat p_{lm} = k_{lm}/N$. Define the global per-question solve rate $\bar p_m = \frac{1}{L}\sum_l \hat p_{lm}$ and weights:

\[w_m \propto \frac{1}{\operatorname{clip}(\bar p_m, a, b)}, \qquad \sum_{m=1}^{M} w_m = 1\]

The model score is:

\[s_l^{\mathrm{inv\text{-}diff}} = \sum_{m=1}^{M} w_m \hat p_{lm}\]

Reference

Inverse probability weighting: https://en.wikipedia.org/wiki/Inverseprobabilityweighting

source

Pairwise Methods

Scorio.elo — Function

elo(
    R;
    K=32.0,
    initial_rating=1500.0,
    tie_handling="correct_draw_only",
    method="competition",
    return_scores=false,
)

Sequential Elo rating over pairwise outcomes induced by R.

For each (question, trial), all model pairs are compared and Elo updates are applied in fixed iteration order. Pair outcomes are:

decisive (1 vs 0): win/loss update
tie (1 vs 1 or 0 vs 0): handled by tie_handling

Arguments

R: binary response tensor of shape (L, M, N) or matrix (L, M) promoted to (L, M, 1).
K: positive Elo step size.
initial_rating: finite initial rating for all models.
tie_handling: one of "skip", "draw", "correct_draw_only".
method: rank tie-handling method passed to rank_scores.
return_scores: if true, return (ranking, ratings).

Returns

ranking by default.
(ranking, ratings) when return_scores=true.

Formula

For each induced pairwise match (i,j) with observed score $S_{ij} \in \{0, 0.5, 1\}$:

\[E_{ij} = \frac{1}{1 + 10^{(r_j-r_i)/400}}\]

\[r_i \leftarrow r_i + K(S_{ij} - E_{ij}), \quad r_j \leftarrow r_j + K((1-S_{ij}) - (1-E_{ij}))\]

Reference

Elo, A. E. (1978). The Rating of Chessplayers, Past and Present.

source

Scorio.trueskill — Function

trueskill(
    R;
    mu_initial=25.0,
    sigma_initial=25.0 / 3,
    beta=25.0 / 6,
    tau=25.0 / 300,
    method="competition",
    return_scores=false,
    tie_handling="skip",
    draw_margin=0.0,
)

Rank models with a sequential two-player TrueSkill-style update over induced pairwise comparisons.

Returns rankings from posterior means mu.

Formula

For one match between models i and j:

\[c = \sqrt{2\beta^2 + \sigma_i^2 + \sigma_j^2}, \quad t = (\mu_i-\mu_j)/c, \quad \epsilon = \text{draw\_margin}/c\]

For decisive outcomes, the update uses $v_{win}(t,\epsilon)$ and $w_{win}(t,\epsilon)$:

\[\mu_i' = \mu_i + \frac{\sigma_i^2}{c} v_{win}(t,\epsilon), \quad \sigma_i'^2 = \sigma_i^2\!\left(1 - \frac{\sigma_i^2}{c^2}w_{win}(t,\epsilon)\right)\]

Draw updates use the analogous $v_{draw}$ and $w_{draw}$ corrections.

Reference

Herbrich, R., Minka, T., & Graepel, T. (2006). TrueSkill(TM): A Bayesian Skill Rating System. NeurIPS 19.

source

Scorio.glicko — Function

glicko(
    R;
    initial_rating=1500.0,
    initial_rd=350.0,
    c=0.0,
    rd_max=350.0,
    tie_handling="correct_draw_only",
    return_deviation=false,
    method="competition",
    return_scores=false,
)

Rank models with sequential Glicko updates over induced pairwise comparisons.

If return_deviation=true, returns (ranking, rating, rd); otherwise returns ranking or (ranking, rating) when return_scores=true.

Formula

Let $q = \ln(10)/400$ and $g(RD) = 1/\sqrt{1 + 3q^2 RD^2/\pi^2}$. For model i in one period:

\[E_{ij} = \frac{1}{1 + 10^{-g(RD_j)(r_i-r_j)/400}}\]

\[d_i^2 = \left(q^2\sum_j g(RD_j)^2 E_{ij}(1-E_{ij})\right)^{-1}\]

\[RD_i' = \left(\frac{1}{RD_i^2} + \frac{1}{d_i^2}\right)^{-1/2}, \quad r_i' = r_i + \frac{q}{\frac{1}{RD_i^2}+\frac{1}{d_i^2}} \sum_j g(RD_j)(S_{ij}-E_{ij})\]

References

Glickman, M. E. (1999). Parameter Estimation in Large Dynamic Paired Comparison Experiments. JRSS C, 48(3), 377-394. https://doi.org/10.1111/1467-9876.00159

source

Paired-Comparison Probabilistic Models

Scorio.bradley_terry — Function

bradley_terry(R; method="competition", return_scores=false, max_iter=500)

Rank models with Bradley-Terry maximum likelihood on decisive pairwise wins.

Let $W_{ij}$ be decisive wins of model i over j and strengths pi_i > 0.

\[\Pr(i \succ j) = \frac{\pi_i}{\pi_i + \pi_j}\]

\[\log p(W\mid \pi) = \sum_{i\ne j} W_{ij}\left[\log \pi_i - \log(\pi_i+\pi_j)\right]\]

References

Bradley, R. A., & Terry, M. E. (1952). Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika. https://doi.org/10.1093/biomet/39.3-4.324

source

Scorio.bradley_terry_map — Function

bradley_terry_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,
)

Rank models with Bradley-Terry MAP estimation using the given prior on centered log-strengths.

With theta_i = log(pi_i):

\[\hat\theta = \arg\min_{\theta} \left[-\log p(W\mid \theta) + \operatorname{penalty}(\theta)\right]\]

\[\hat\pi_i = \exp(\hat\theta_i)\]

Reference

Caron, F., & Doucet, A. (2012). Efficient Bayesian inference for generalized Bradley-Terry models. https://doi.org/10.1080/10618600.2012.638220

source

Scorio.bradley_terry_davidson — Function

bradley_terry_davidson(R; method="competition", return_scores=false, max_iter=500)

Rank models with Bradley-Terry-Davidson ML, incorporating explicit tie mass.

The Davidson tie extension introduces nu > 0:

\[\Pr(i\succ j) = \frac{\pi_i}{\pi_i+\pi_j+\nu\sqrt{\pi_i\pi_j}}, \quad \Pr(i\sim j) = \frac{\nu\sqrt{\pi_i\pi_j}}{\pi_i+\pi_j+\nu\sqrt{\pi_i\pi_j}}\]

Reference

Davidson, R. R. (1970). On extending the Bradley-Terry model to accommodate ties in paired comparison experiments. https://doi.org/10.1080/01621459.1970.10481082

source

Scorio.bradley_terry_davidson_map — Function

bradley_terry_davidson_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,

)

Rank models with Bradley-Terry-Davidson MAP estimation.

\[(\hat\theta,\hat\nu) = \arg\min_{\theta,\nu>0} \left[-\log p(W,T\mid \theta,\nu) + \operatorname{penalty}(\theta)\right]\]

source

Scorio.rao_kupper — Function

rao_kupper(
    R;
    tie_strength=1.1,
    method="competition",
    return_scores=false,
    max_iter=500,

)

Rank models with the Rao-Kupper tie model (ML).

With fixed $kappa \ge 1$:

\[\Pr(i\succ j)=\frac{\pi_i}{\pi_i+\kappa\pi_j}, \quad \Pr(j\succ i)=\frac{\pi_j}{\kappa\pi_i+\pi_j}\]

\[\Pr(i\sim j)= \frac{(\kappa^2-1)\pi_i\pi_j} {(\pi_i+\kappa\pi_j)(\kappa\pi_i+\pi_j)}\]

Reference

Rao, P. V., & Kupper, L. L. (1967). Ties in paired-comparison experiments: A generalization of the Bradley-Terry model. https://doi.org/10.1080/01621459.1967.10482901

source

Scorio.rao_kupper_map — Function

rao_kupper_map(
    R;
    tie_strength=1.1,
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,

)

Rank models with the Rao-Kupper tie model under MAP estimation.

\[\hat\theta = \arg\min_{\theta} \left[-\log p(W,T\mid \theta,\kappa) + \operatorname{penalty}(\theta)\right]\]

source

Bayesian Ranking

Scorio.thompson — Function

thompson(
    R;
    n_samples=10000,
    prior_alpha=1.0,
    prior_beta=1.0,
    seed=42,
    method="competition",
    return_scores=false,
)

Rank models by Thompson sampling over Beta posteriors of model success rates.

The returned score for each model is negative average sampled rank (higher is better).

Let $S_l = \sum_{m,n} R_{lmn}$ and T = M N. Posterior per model:

\[p_l \mid R \sim \mathrm{Beta}(\alpha + S_l,\ \beta + T - S_l)\]

With posterior draws $t=1,\dots,T_s$ and sampled rank $r_l^{(t)}$:

\[s_l^{\mathrm{TS}} = -\frac{1}{T_s}\sum_{t=1}^{T_s} r_l^{(t)}\]

References

Thompson, W. R. (1933). On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika. https://doi.org/10.1093/biomet/25.3-4.285

Russo, D. J., et al. (2018). A Tutorial on Thompson Sampling. https://doi.org/10.1561/2200000070

source

Scorio.bayesian_mcmc — Function

bayesian_mcmc(
    R;
    n_samples=5000,
    burnin=1000,
    prior_var=1.0,
    seed=42,
    method="competition",
    return_scores=false,
)

Rank models via random-walk Metropolis MCMC under a Bradley-Terry-style pairwise likelihood with Gaussian prior on latent abilities.

Scores are posterior means of sampled latent abilities.

Let $W_{ij}$ be decisive wins of model i over j, and latent log-strengths theta_i with Gaussian prior variance sigma^2 = prior_var.

\[\Pr(i \succ j \mid \theta) = \frac{\exp(\theta_i)}{\exp(\theta_i)+\exp(\theta_j)}, \qquad \theta_i \sim \mathcal{N}(0,\sigma^2)\]

The returned score is the posterior mean:

\[s_i^{\mathrm{MCMC}} = \mathbb{E}[\theta_i \mid W]\]

References

Bradley, R. A., & Terry, M. E. (1952). Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika. https://doi.org/10.1093/biomet/39.3-4.324

Metropolis, N., et al. (1953). Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics. https://doi.org/10.1063/1.1699114

source

Item Response Theory

Scorio.rasch — Function

rasch(
    R;
    method="competition",
    return_scores=false,
    max_iter=500,
    return_item_params=false,
)

Rank models with Rasch (1PL) maximum-likelihood estimation.

Returns rankings from estimated abilities theta. When return_item_params=true, also returns item difficulties.

For counts $k_{lm}=\sum_n R_{lmn}$:

\[k_{lm} \sim \mathrm{Binomial}\!\left(N,\sigma(\theta_l-b_m)\right)\]

Item difficulties are mean-centered for identifiability:

\[b \leftarrow b - \frac{1}{M}\sum_m b_m\]

Reference

Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests.

source

Scorio.rasch_map — Function

rasch_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,
    return_item_params=false,
)

Rank models with Rasch (1PL) MAP estimation using an ability prior.

\[(\hat\theta,\hat b) = \arg\min_{\theta,b} \left[ -\sum_{l,m}\log p(k_{lm}\mid \theta_l,b_m) + \operatorname{penalty}(\theta) \right]\]

Reference

Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika.

source

Scorio.rasch_2pl — Function

rasch_2pl(
    R;
    method="competition",
    return_scores=false,
    max_iter=500,
    return_item_params=false,
    reg_discrimination=0.01,
)

Rank models with 2PL IRT maximum likelihood (ability + item discrimination).

\[k_{lm} \sim \mathrm{Binomial}\!\left( N,\sigma\!\left(a_m(\theta_l-b_m)\right)\right)\]

source

Scorio.rasch_2pl_map — Function

rasch_2pl_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,
    return_item_params=false,
    reg_discrimination=0.01,
)

Rank models with 2PL IRT MAP estimation.

Same 2PL likelihood as rasch_2pl, plus prior regularization on abilities:

\[\hat\theta \in \arg\min_{\theta,\cdots} \left[-\log p(k\mid \theta,\cdots)+\operatorname{penalty}(\theta)\right]\]

source

Scorio.rasch_3pl — Function

rasch_3pl(
    R;
    method="competition",
    return_scores=false,
    max_iter=500,
    fix_guessing=nothing,
    return_item_params=false,
    reg_discrimination=0.01,
    reg_guessing=0.1,
    guessing_upper=0.5,
)

Rank models with 3PL IRT maximum likelihood (ability, discrimination, guessing).

\[p_{lm} = c_m + (1-c_m)\sigma\!\left(a_m(\theta_l-b_m)\right)\]

with $c_m \in [0, \text{guessing_upper}]$.

source

Scorio.rasch_3pl_map — Function

rasch_3pl_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,
    fix_guessing=nothing,
    return_item_params=false,
    reg_discrimination=0.01,
    reg_guessing=0.1,
    guessing_upper=0.5,
)

Rank models with 3PL IRT MAP estimation.

Same 3PL likelihood as rasch_3pl, with prior penalty on abilities:

\[\hat\theta \in \arg\min_{\theta,\cdots} \left[-\log p(k\mid \theta,\cdots)+\operatorname{penalty}(\theta)\right]\]

source

Scorio.dynamic_irt — Function

dynamic_irt(
    R;
    variant="linear",
    method="competition",
    return_scores=false,
    max_iter=500,
    return_item_params=false,
    time_points=nothing,
    score_target="final",
    slope_reg=0.01,
    state_reg=1.0,
    assume_time_axis=false,
)

Rank models with dynamic IRT variants:

"linear": static Rasch baseline
"growth": linear growth path
"state_space": smoothed latent trajectory

Growth variant:

\[\theta_{ln} = \theta_{0,l} + \theta_{1,l} t_n,\qquad P(R_{lmn}=1)=\sigma(\theta_{ln}-b_m)\]

State-space variant:

\[P(R_{lmn}=1)=\sigma(\theta_{ln}-b_m)\]

with smoothness penalty

\[\lambda \sum_{l,n>1} \frac{(\theta_{ln}-\theta_{l,n-1})^2}{t_n-t_{n-1}}\]

References

Verhelst, N. D., & Glas, C. A. (1993). A dynamic generalization of the Rasch model. Psychometrika.

source

Scorio.rasch_mml — Function

rasch_mml(
    R;
    method="competition",
    return_scores=false,
    max_iter=100,
    em_iter=20,
    n_quadrature=21,
    return_item_params=false,
)

Rank models with Rasch marginal maximum likelihood using EM + quadrature.

Using quadrature nodes $\theta_q$ and weights w_q, posterior mass for model l is:

\[w_{lq} \propto p(k_l\mid \theta_q,b)\,w_q\]

EAP ability score:

\[\hat\theta_l^{\mathrm{EAP}} = \sum_q w_{lq}\theta_q\]

References

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika.

source

Scorio.rasch_mml_credible — Function

rasch_mml_credible(
    R;
    quantile=0.05,
    method="competition",
    return_scores=false,
    max_iter=100,
    em_iter=20,
    n_quadrature=21,
)

Rank models by posterior ability quantiles from Rasch MML posterior mass.

\[s_l = Q_q(\theta_l \mid R)\]

Lower q (for example 0.05) yields a more conservative ranking.

source

Voting Methods

Scorio.borda — Function

borda(R; method="competition", return_scores=false)

Rank models with Borda count from per-question model orderings.

Let $k_{lm} = \sum_{n=1}^{N} R_{lmn}$ and $r_{lm}$ be the descending tie-averaged rank of model l on question m (rank 1 is best):

\[s_l^{\mathrm{Borda}} = \sum_{m=1}^{M} (L - r_{lm})\]

Reference

de Borda, J.-C. (1781/1784). Mémoire sur les élections au scrutin.

source

Scorio.copeland — Function

copeland(R; method="competition", return_scores=false)

Rank models by Copeland score over pairwise question-level majorities.

Let $W^{(q)}_{ij}$ be the number of questions where $k_{im} > k_{jm}$:

\[s_i^{\mathrm{Copeland}} = \sum_{j\ne i}\operatorname{sign}\!\left(W^{(q)}_{ij} - W^{(q)}_{ji}\right)\]

source

Scorio.win_rate — Function

win_rate(R; method="competition", return_scores=false)

Rank models by aggregate pairwise win rate.

With the same $W^{(q)}_{ij}$ counts:

\[s_i^{\mathrm{winrate}} = \frac{\sum_{j\ne i} W^{(q)}_{ij}} {\sum_{j\ne i}\left(W^{(q)}_{ij}+W^{(q)}_{ji}\right)}\]

Models with no decisive pairwise outcomes receive score 0.5.

source

Scorio.minimax — Function

minimax(
    R;
    variant="margin",
    tie_policy="half",
    method="competition",
    return_scores=false,
)

Rank models with Minimax (Simpson-Kramer), using worst defeat strength.

Let $P_{ij}$ be pairwise preference counts and $\Delta_{ij}=P_{ij}-P_{ji}$.

Margin variant:

\[s_i^{\mathrm{minimax}} = -\max_{j\ne i}\max(0,\Delta_{ji})\]

Winning-votes variant:

\[s_i^{\mathrm{wv}} = -\max_{j:\,P_{ji}>P_{ij}} P_{ji}\]

source

Scorio.schulze — Function

schulze(R; tie_policy="half", method="competition", return_scores=false)

Rank models using the Schulze strongest-path method.

Initialize:

\[p_{ij} = \begin{cases} P_{ij}, & P_{ij}>P_{ji} \\ 0, & \text{otherwise} \end{cases}\]

Then apply strongest-path closure:

\[p_{jk} \leftarrow \max\!\left(p_{jk}, \min(p_{ji}, p_{ik})\right)\]

Model i beats j if $p_{ij} > p_{ji}$.

source

Scorio.ranked_pairs — Function

ranked_pairs(
    R;
    strength="margin",
    tie_policy="half",
    method="competition",
    return_scores=false,
)

Rank models with Ranked Pairs (Tideman) by locking pairwise victories without creating directed cycles.

Victories are sorted by strength (margin or winning-votes), then each edge winner -> loser is locked only if it does not create a directed cycle in the current locked graph.

source

Scorio.kemeny_young — Function

kemeny_young(
    R;
    tie_policy="half",
    method="competition",
    return_scores=false,
    time_limit=nothing,
    tie_aware=true,
)

Rank models with Kemeny-Young via MILP optimization. With tie_aware=true, the routine analyzes forced pairwise orders among optimal solutions.

Binary variables $y_{ij}$ indicate whether model i is above j:

\[\max_y \sum_{i\ne j} P_{ij} y_{ij}\]

subject to:

\[y_{ij}+y_{ji}=1,\qquad y_{ij}+y_{jk}+y_{ki}\le 2 \quad (\forall i,j,k\ \text{distinct})\]

tie_aware=true checks which pairwise orders are forced across all optimal solutions and ranks by that forced-order DAG.

source

Scorio.nanson — Function

nanson(R; rank_ties="average", method="competition", return_scores=false)

Rank models with Nanson's elimination rule (iterative Borda with below-mean elimination).

At round t, with active set A_t and Borda scores $s_i^{(t)}$:

\[E_t = \{ i\in A_t : s_i^{(t)} < \overline{s}^{(t)} \}, \qquad A_{t+1} = A_t \setminus E_t\]

source

Scorio.baldwin — Function

baldwin(R; rank_ties="average", method="competition", return_scores=false)

Rank models with Baldwin's elimination rule (iterative elimination of minimum Borda score).

At round t:

\[E_t = \arg\min_{i\in A_t} s_i^{(t)}, \qquad A_{t+1} = A_t \setminus E_t\]

This implementation removes all models tied at the minimum in a round.

source

Scorio.majority_judgment — Function

majority_judgment(R; method="competition", return_scores=false)

Rank models using Majority Judgment with recursive median-grade tie-breaking.

Each question assigns grade $k_{lm}\in\{0,\dots,N\}$. Models are compared by lower median grade; ties are broken by recursively removing one occurrence of the current median grade from tied models and repeating the comparison.

Reference

Balinski, M., & Laraki, R. (2011). Majority Judgment.

source

Graph-based Methods

Scorio.pagerank — Function

pagerank(
    R;
    damping=0.85,
    max_iter=100,
    tol=1e-6,
    method="competition",
    return_scores=false,
    teleport=nothing,
)

Rank models with PageRank on the pairwise win-probability graph.

Let $\hat P_{i\succ j}$ be empirical tied-split win probabilities. Column-normalized transition matrix:

\[P_{ij} = \frac{\hat P_{i\succ j}}{\sum_{k\ne j}\hat P_{k\succ j}}\]

PageRank fixed point:

\[r = d P r + (1-d)e\]

where e is a teleportation distribution (uniform by default).

Reference

Page, L., et al. (1999). The PageRank Citation Ranking.

source

Scorio.spectral — Function

spectral(
    R;
    max_iter=10000,
    tol=1e-12,
    method="competition",
    return_scores=false,
)

Rank models by the dominant eigenvector of a spectral centrality matrix built from pairwise win probabilities.

Construct:

\[W_{ij}=\hat P_{i\succ j}\ (i\ne j), \qquad W_{ii}=\sum_{j\ne i}W_{ij}\]

Score vector is the normalized dominant right eigenvector:

\[v \propto Wv,\qquad \sum_i v_i=1\]

source

Scorio.alpharank — Function

alpharank(
    R;
    alpha=1.0,
    population_size=50,
    max_iter=100000,
    tol=1e-12,
    method="competition",
    return_scores=false,
)

Rank models with single-population alpha-Rank stationary distribution scores.

For resident s, mutant r, population size m:

\[u = \alpha\frac{m}{m-1}\left(\hat P_{r\succ s}-\frac12\right),\qquad \rho_{r,s}= \begin{cases} \frac{1-e^{-u}}{1-e^{-mu}}, & u\ne 0\\ \frac{1}{m}, & u=0 \end{cases}\]

Transition matrix:

\[C_{sr}=\frac{1}{L-1}\rho_{r,s},\qquad C_{ss}=1-\sum_{r\ne s}C_{sr}\]

Ranking uses the stationary distribution of C.

Reference

Omidshafiei, S., et al. (2019). α-Rank: Multi-Agent Evaluation by Evolution. Scientific Reports.

source

Scorio.nash — Function

nash(
    R;
    n_iter=100,
    temperature=0.1,
    solver="lp",
    score_type="vs_equilibrium",
    return_equilibrium=false,
    method="competition",
    return_scores=false,
)

Rank models from a Nash-equilibrium mixture of the zero-sum meta-game induced by pairwise win probabilities.

Payoff matrix:

\[A_{ij}=2\hat P_{i\succ j}-1,\qquad A_{ii}=0\]

Equilibrium mixture x is found by LP:

\[\max_{x\in\Delta^{L-1}} v \quad\text{s.t.}\quad \sum_i A_{ij}x_i \ge v,\ \forall j\]

Default score type ("vs_equilibrium") is:

\[s_i = \sum_j \hat P_{i\succ j} x_j\]

source

Centrality and Spectral Variants

Scorio.rank_centrality — Function

rank_centrality(
    R;
    method="competition",
    return_scores=false,
    tie_handling="half",
    smoothing=0.0,
    teleport=0.0,
    max_iter=10000,
    tol=1e-12,
)

Rank models with Rank Centrality using stationary distribution of a pairwise-comparison Markov chain.

Let d_max be the maximum degree of the undirected comparison graph and $\hat P_{j\succ i}$ the empirical probability that j beats i. For $i \ne j$:

\[P_{ij} = \frac{1}{d_{\max}}\,\hat P_{j\succ i}, \qquad P_{ii} = 1 - \sum_{j\ne i} P_{ij}\]

Scores are stationary probabilities $\pi$ with:

\[\pi^\top P = \pi^\top,\qquad \sum_i \pi_i = 1\]

Reference

Negahban, S., Oh, S., & Shah, D. (2017). Rank Centrality: Ranking from Pairwise Comparisons. Operations Research.

source

Scorio.serial_rank — Function

serial_rank(R; comparison="prob_diff", method="competition", return_scores=false)

Rank models with SerialRank spectral seriation using a Fiedler-vector ordering from comparison-induced similarity.

With pairwise comparison matrix C (skew-symmetric), SerialRank builds:

\[S = \frac{1}{2}\left(L\mathbf{1}\mathbf{1}^{\top} + C C^{\top}\right), \qquad L_S = \operatorname{diag}(S\mathbf{1}) - S\]

Scores are the oriented Fiedler vector of L_S (eigenvector of the second-smallest eigenvalue), with sign chosen to best match observed pairwise directions.

Reference

Fogel, F., d'Aspremont, A., & Vojnovic, M. (2016). Spectral Ranking Using Seriation. JMLR.

source

Scorio.hodge_rank — Function

hodge_rank(
    R;
    pairwise_stat="binary",
    weight_method="total",
    epsilon=0.5,
    method="competition",
    return_scores=false,
    return_diagnostics=false,
)

Rank models with l2 HodgeRank on a weighted pairwise-comparison graph.

Let $Y_{ij}$ be a skew-symmetric observed pairwise flow and $w_{ij}\ge 0$ edge weights. HodgeRank solves:

\[s^\star \in \arg\min_s \sum_{i<j} w_{ij}\left((s_j-s_i)-Y_{ij}\right)^2\]

Equivalent normal equations:

\[\Delta_0 s^\star = -\operatorname{div}(Y), \qquad s^\star = -\Delta_0^\dagger \operatorname{div}(Y)\]

where $\Delta_0^\dagger$ is the Laplacian pseudoinverse.

Reference

Jiang, X., Lim, L.-H., Yao, Y., & Ye, Y. (2009). Statistical Ranking and Combinatorial Hodge Theory. https://arxiv.org/abs/0811.1067

source

Listwise and Choice Models

Scorio.plackett_luce — Function

plackett_luce(
    R;
    method="competition",
    return_scores=false,
    max_iter=500,
    tol=1e-8,
)

Rank models with Plackett-Luce ML on decisive pairwise-reduced outcomes.

This implementation uses pairwise decisive counts and Hunter's MM update:

\[\pi_i^{(k+1)} = \frac{\sum_j W_{ij}} {\sum_{j\ne i}(W_{ij}+W_{ji})/(\pi_i^{(k)}+\pi_j^{(k)})}\]

followed by normalization of $\pi$.

References

Plackett, R. L. (1975). The Analysis of Permutations. Hunter, D. R. (2004). MM algorithms for generalized Bradley-Terry models.

source

Scorio.plackett_luce_map — Function

plackett_luce_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,
)

Rank models with Plackett-Luce MAP using a prior on centered log-strengths.

With $theta_i = \log \pi_i$:

\[\hat\theta \in \arg\min_\theta \left[ -\sum_{i\ne j}W_{ij}\left(\theta_i-\log(e^{\theta_i}+e^{\theta_j})\right) + \operatorname{penalty}(\theta-\bar\theta) \right]\]

source

Scorio.davidson_luce — Function

davidson_luce(
    R;
    method="competition",
    return_scores=false,
    max_iter=500,
    max_tie_order=nothing,

)

Rank models with Davidson-Luce setwise tie likelihood (ML).

For event comparison set $S=W\cup L$, winner set size t=|W|, $g_t(W)=\left(\prod_{i\in W}\alpha_i\right)^{1/t}$, and tie-order parameters delta_t:

\[\Pr(W\mid S)= \frac{\delta_t g_t(W)} {\sum_{t'=1}^{\min(D,|S|)}\delta_{t'} \sum_{|T|=t'} g_{t'}(T)}\]

Reference

Firth, D., Kosmidis, I., & Turner, H. L. (2019). Davidson-Luce model for multi-item choice with ties.

source

Scorio.davidson_luce_map — Function

davidson_luce_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,
    max_tie_order=nothing,

)

Rank models with Davidson-Luce MAP estimation.

\[\hat\theta \in \arg\min_\theta \left[-\log p(\text{events}\mid\theta,\delta) + \operatorname{penalty}(\theta)\right]\]

source

Scorio.bradley_terry_luce — Function

bradley_terry_luce(
    R;
    method="competition",
    return_scores=false,
    max_iter=500,
)

Rank models with Bradley-Terry-Luce composite-likelihood ML from setwise winner/loser events.

For each event (W,L), each winner $i\in W$ is treated as a Luce choice from ${i}\cup L$, yielding composite log-likelihood:

\[\ell_{\mathrm{comp}}(\pi) = \sum_{(W,L)}\sum_{i\in W} \left[ \log\pi_i - \log\!\left(\pi_i+\sum_{j\in L}\pi_j\right) \right]\]

source

Scorio.bradley_terry_luce_map — Function

bradley_terry_luce_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,

)

Rank models with Bradley-Terry-Luce composite-likelihood MAP estimation.

\[\hat\theta \in \arg\min_\theta \left[-\ell_{\mathrm{comp}}(\theta)+\operatorname{penalty}(\theta)\right]\]

source