Ranking API

Priors

Scorio.PriorType
Prior

Abstract supertype for prior penalty specifications used by MAP rankers.

Concrete subtypes define hyperparameters for an internal penalty term over the latent score vector theta.

source
Scorio.GaussianPriorType
GaussianPrior(mean=0.0, var=1.0)

Gaussian prior on latent parameters with quadratic penalty.

Arguments

  • mean::Real=0.0: prior mean.
  • var::Real=1.0: prior variance; must be positive.

Returns

  • GaussianPrior

Formula

\[\operatorname{penalty}(\theta) = \frac{1}{2\,\mathrm{var}}\sum_i (\theta_i-\mathrm{mean})^2\]

source
Scorio.LaplacePriorType
LaplacePrior(loc=0.0, scale=1.0)

Laplace prior on latent parameters with L1 penalty.

Arguments

  • loc::Real=0.0: location parameter.
  • scale::Real=1.0: scale parameter; must be positive.

Returns

  • LaplacePrior

Formula

\[\operatorname{penalty}(\theta) = \frac{1}{\mathrm{scale}}\sum_i \left|\theta_i-\mathrm{loc}\right|\]

source
Scorio.CauchyPriorType
CauchyPrior(loc=0.0, scale=1.0)

Cauchy prior on latent parameters with log-quadratic penalty.

Arguments

  • loc::Real=0.0: location parameter.
  • scale::Real=1.0: scale parameter; must be positive.

Returns

  • CauchyPrior

Formula

Let $z_i = (\theta_i-\mathrm{loc})/\mathrm{scale}$.

\[\operatorname{penalty}(\theta) = \sum_i \log(1 + z_i^2)\]

source
Scorio.UniformPriorType
UniformPrior()

Improper flat prior with zero penalty.

Returns

  • UniformPrior

Formula

\[\operatorname{penalty}(\theta) = 0\]

source
Scorio.CustomPriorType
CustomPrior(penalty_fn)

User-defined prior from a callable penalty function.

Arguments

  • penalty_fn: callable with signature penalty_fn(theta) returning a scalar penalty value.

Returns

  • CustomPrior

Notes

penalty_fn is used directly with no transformation of theta.

source
Scorio.EmpiricalPriorType
EmpiricalPrior(R0; var=1.0, eps=1e-6)

Empirical Gaussian-style prior centered at logits inferred from baseline outcomes.

R0 is accepted as shape (L, M) or (L, M, D). A 2D input is promoted to (L, M, 1).

Arguments

  • R0: baseline outcomes per model. Typically binary outcomes in {0,1}.
  • var::Real=1.0: variance used in the quadratic penalty; must be positive.
  • eps::Real=1e-6: clipping level used before logit transform. No explicit range check is applied; choose 0 < eps < 0.5 in practice.

Returns

  • EmpiricalPrior: stores R0, var, eps, and centered prior_mean.

Formula

For model $l$:

\[a_l = \frac{1}{M D}\sum_{m=1}^{M}\sum_{d=1}^{D} R^0_{lmd}\]

\[\tilde a_l = \operatorname{clip}(a_l, \varepsilon, 1-\varepsilon), \qquad \mu_l = \log\!\left(\frac{\tilde a_l}{1-\tilde a_l}\right)\]

Then mean-center $\mu$ for identifiability and use:

\[\operatorname{penalty}(\theta) = \frac{1}{2\,\mathrm{var}}\sum_{l=1}^{L}(\theta_l-\mu_l)^2\]

Examples

R0 = Int[
    1 1 1 0 1
    0 1 0 0 1
]
prior = EmpiricalPrior(R0; var=2.0, eps=1e-6)
source

Evaluation-based Ranking

Bayes

Scorio.bayesMethod
bayes(
    R::AbstractArray{<:Integer, 3},
    w=nothing;
    R0=nothing,
    quantile=nothing,
    method="competition",
    return_scores=false,
)

Rank models by Bayes@N scores computed independently per model.

If quantile is provided, models are ranked by mu + z_q * sigma; otherwise by posterior mean mu.

References

Hariri, M., Samandar, A., Hinczewski, M., & Chaudhary, V. (2026). Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation. arXiv:2510.04265. https://arxiv.org/abs/2510.04265

Formula

For each model l, let (mu_l, sigma_l) = Scorio.bayes(R_l, w, R0_l).

\[s_l = \begin{cases} \mu_l, & \text{if quantile is not set} \\ \mu_l + \Phi^{-1}(q)\,\sigma_l, & \text{if quantile}=q \in [0,1] \end{cases}\]

Arguments

  • R: integer tensor (L, M, N) with values in {0, ..., C}.
  • w: class weights of length C+1. If not provided and R is binary (contains only 0 and 1), defaults to [0.0, 1.0]. For non-binary R, w is required.
  • R0: optional shared prior (M, D) or model-specific prior (L, M, D).
  • quantile: optional value in [0, 1] for quantile-adjusted ranking.
  • method, return_scores: ranking output controls.
source

Avg

Scorio.avgMethod
avg(R; method="competition", return_scores=false)

Rank models by per-model mean accuracy across all questions and trials.

For each model l, compute the scalar score:

\[s_l^{\mathrm{avg}} = \frac{1}{MN}\sum_{m=1}^{M}\sum_{n=1}^{N} R_{lmn}\]

Higher scores are better; ranking is produced by rank_scores.

Arguments

  • R: binary response tensor (L, M, N) or matrix (L, M) promoted to (L, M, 1).
  • method: tie-handling rule for rank_scores.
  • return_scores: if true, return (ranking, scores).
source

Pass@k Family

Scorio.pass_at_kMethod
pass_at_k(R::AbstractArray{<:Integer, 3}, k; method="competition", return_scores=false)

Rank models by per-model Pass@k scores.

For each model l, define per-question success counts $nu_{lm} = \sum_{n=1}^{N} R_{lmn}$. Then:

\[s_l^{\mathrm{Pass@}k} = \frac{1}{M}\sum_{m=1}^{M} \left(1 - \frac{\binom{N-\nu_{lm}}{k}}{\binom{N}{k}}\right)\]

References

Chen, M., Tworek, J., Jun, H., et al. (2021). Evaluating Large Language Models Trained on Code. arXiv:2107.03374. https://arxiv.org/abs/2107.03374

source
Scorio.pass_hat_kMethod
pass_hat_k(R::AbstractArray{<:Integer, 3}, k; method="competition", return_scores=false)

Rank models by per-model Pass-hat@k (G-Pass@k) scores.

With $nu_{lm} = \sum_{n=1}^{N} R_{lmn}$:

\[s_l^{\widehat{\mathrm{Pass@}k}} = \frac{1}{M}\sum_{m=1}^{M} \frac{\binom{\nu_{lm}}{k}}{\binom{N}{k}}\]

References

Yao, S., Shinn, N., Razavi, P., & Narasimhan, K. (2024). tau-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains. arXiv:2406.12045. https://arxiv.org/abs/2406.12045

source
Scorio.g_pass_at_k_tauMethod
g_pass_at_k_tau(
    R::AbstractArray{<:Integer, 3},
    k,
    tau;
    method="competition",
    return_scores=false,

)

Rank models by generalized G-Pass@k_τ per model.

Let $X_{lm} ~ Hypergeometric(N, nu_{lm}, k)$ where $nu_{lm} = \sum_{n=1}^{N} R_{lmn}$. The score is:

\[s_l^{\mathrm{G\text{-}Pass@}k_{\tau}} = \frac{1}{M}\sum_{m=1}^{M} \Pr\!\left(X_{lm}\ge \lceil \tau k \rceil\right)\]

\[\Pr(X_{lm}\ge \lceil \tau k \rceil) = \sum_{j=\lceil \tau k \rceil}^{k} \frac{\binom{\nu_{lm}}{j}\binom{N-\nu_{lm}}{k-j}}{\binom{N}{k}}\]

References

Liu, J., Liu, H., Xiao, L., et al. (2024). Are Your LLMs Capable of Stable Reasoning? arXiv:2412.13147. https://arxiv.org/abs/2412.13147

source
Scorio.mg_pass_at_kMethod
mg_pass_at_k(R::AbstractArray{<:Integer, 3}, k; method="competition", return_scores=false)

Rank models by per-model mG-Pass@k scores.

With $X_{lm} ~ Hypergeometric(N, nu_{lm}, k)$ and $m_0 = \lceil k/2 \rceil$:

\[s_l^{\mathrm{mG\text{-}Pass@}k} = \frac{1}{M}\sum_{m=1}^{M} \frac{2}{k}\,\mathbb{E}\!\left[(X_{lm}-m_0)_+\right]\]

Equivalent discrete form:

\[\frac{2}{k}\sum_{i=m_0+1}^{k}\Pr(X_{lm}\ge i)\]

References

Liu, J., Liu, H., Xiao, L., et al. (2024). Are Your LLMs Capable of Stable Reasoning? arXiv:2412.13147. https://arxiv.org/abs/2412.13147

source

Pointwise Methods

Scorio.inverse_difficultyFunction
inverse_difficulty(
    R;
    method="competition",
    return_scores=false,
    clip_range=(0.01, 0.99),
)

Rank models by question accuracy weighted by inverse empirical question difficulty.

Each question weight is proportional to 1 / p_correct(question), after clipping p_correct to clip_range and normalizing weights to sum to 1.

Let $k_{lm} = \sum_{n=1}^{N} R_{lmn}$ and $\hat p_{lm} = k_{lm}/N$. Define the global per-question solve rate $\bar p_m = \frac{1}{L}\sum_l \hat p_{lm}$ and weights:

\[w_m \propto \frac{1}{\operatorname{clip}(\bar p_m, a, b)}, \qquad \sum_{m=1}^{M} w_m = 1\]

The model score is:

\[s_l^{\mathrm{inv\text{-}diff}} = \sum_{m=1}^{M} w_m \hat p_{lm}\]

Reference

Inverse probability weighting: https://en.wikipedia.org/wiki/Inverseprobabilityweighting

source

Pairwise Methods

Scorio.eloFunction
elo(
    R;
    K=32.0,
    initial_rating=1500.0,
    tie_handling="correct_draw_only",
    method="competition",
    return_scores=false,
)

Sequential Elo rating over pairwise outcomes induced by R.

For each (question, trial), all model pairs are compared and Elo updates are applied in fixed iteration order. Pair outcomes are:

  • decisive (1 vs 0): win/loss update
  • tie (1 vs 1 or 0 vs 0): handled by tie_handling

Arguments

  • R: binary response tensor of shape (L, M, N) or matrix (L, M) promoted to (L, M, 1).
  • K: positive Elo step size.
  • initial_rating: finite initial rating for all models.
  • tie_handling: one of "skip", "draw", "correct_draw_only".
  • method: rank tie-handling method passed to rank_scores.
  • return_scores: if true, return (ranking, ratings).

Returns

  • ranking by default.
  • (ranking, ratings) when return_scores=true.

Formula

For each induced pairwise match (i,j) with observed score $S_{ij} \in \{0, 0.5, 1\}$:

\[E_{ij} = \frac{1}{1 + 10^{(r_j-r_i)/400}}\]

\[r_i \leftarrow r_i + K(S_{ij} - E_{ij}), \quad r_j \leftarrow r_j + K((1-S_{ij}) - (1-E_{ij}))\]

Reference

Elo, A. E. (1978). The Rating of Chessplayers, Past and Present.

source
Scorio.trueskillFunction
trueskill(
    R;
    mu_initial=25.0,
    sigma_initial=25.0 / 3,
    beta=25.0 / 6,
    tau=25.0 / 300,
    method="competition",
    return_scores=false,
    tie_handling="skip",
    draw_margin=0.0,
)

Rank models with a sequential two-player TrueSkill-style update over induced pairwise comparisons.

Returns rankings from posterior means mu.

Formula

For one match between models i and j:

\[c = \sqrt{2\beta^2 + \sigma_i^2 + \sigma_j^2}, \quad t = (\mu_i-\mu_j)/c, \quad \epsilon = \text{draw\_margin}/c\]

For decisive outcomes, the update uses $v_{win}(t,\epsilon)$ and $w_{win}(t,\epsilon)$:

\[\mu_i' = \mu_i + \frac{\sigma_i^2}{c} v_{win}(t,\epsilon), \quad \sigma_i'^2 = \sigma_i^2\!\left(1 - \frac{\sigma_i^2}{c^2}w_{win}(t,\epsilon)\right)\]

Draw updates use the analogous $v_{draw}$ and $w_{draw}$ corrections.

Reference

Herbrich, R., Minka, T., & Graepel, T. (2006). TrueSkill(TM): A Bayesian Skill Rating System. NeurIPS 19.

source
Scorio.glickoFunction
glicko(
    R;
    initial_rating=1500.0,
    initial_rd=350.0,
    c=0.0,
    rd_max=350.0,
    tie_handling="correct_draw_only",
    return_deviation=false,
    method="competition",
    return_scores=false,
)

Rank models with sequential Glicko updates over induced pairwise comparisons.

If return_deviation=true, returns (ranking, rating, rd); otherwise returns ranking or (ranking, rating) when return_scores=true.

Formula

Let $q = \ln(10)/400$ and $g(RD) = 1/\sqrt{1 + 3q^2 RD^2/\pi^2}$. For model i in one period:

\[E_{ij} = \frac{1}{1 + 10^{-g(RD_j)(r_i-r_j)/400}}\]

\[d_i^2 = \left(q^2\sum_j g(RD_j)^2 E_{ij}(1-E_{ij})\right)^{-1}\]

\[RD_i' = \left(\frac{1}{RD_i^2} + \frac{1}{d_i^2}\right)^{-1/2}, \quad r_i' = r_i + \frac{q}{\frac{1}{RD_i^2}+\frac{1}{d_i^2}} \sum_j g(RD_j)(S_{ij}-E_{ij})\]

References

Glickman, M. E. (1999). Parameter Estimation in Large Dynamic Paired Comparison Experiments. JRSS C, 48(3), 377-394. https://doi.org/10.1111/1467-9876.00159

source

Paired-Comparison Probabilistic Models

Scorio.bradley_terryFunction
bradley_terry(R; method="competition", return_scores=false, max_iter=500)

Rank models with Bradley-Terry maximum likelihood on decisive pairwise wins.

Let $W_{ij}$ be decisive wins of model i over j and strengths pi_i > 0.

\[\Pr(i \succ j) = \frac{\pi_i}{\pi_i + \pi_j}\]

\[\log p(W\mid \pi) = \sum_{i\ne j} W_{ij}\left[\log \pi_i - \log(\pi_i+\pi_j)\right]\]

References

Bradley, R. A., & Terry, M. E. (1952). Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika. https://doi.org/10.1093/biomet/39.3-4.324

source
Scorio.bradley_terry_mapFunction
bradley_terry_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,
)

Rank models with Bradley-Terry MAP estimation using the given prior on centered log-strengths.

With theta_i = log(pi_i):

\[\hat\theta = \arg\min_{\theta} \left[-\log p(W\mid \theta) + \operatorname{penalty}(\theta)\right]\]

\[\hat\pi_i = \exp(\hat\theta_i)\]

Reference

Caron, F., & Doucet, A. (2012). Efficient Bayesian inference for generalized Bradley-Terry models. https://doi.org/10.1080/10618600.2012.638220

source
Scorio.bradley_terry_davidsonFunction
bradley_terry_davidson(R; method="competition", return_scores=false, max_iter=500)

Rank models with Bradley-Terry-Davidson ML, incorporating explicit tie mass.

The Davidson tie extension introduces nu > 0:

\[\Pr(i\succ j) = \frac{\pi_i}{\pi_i+\pi_j+\nu\sqrt{\pi_i\pi_j}}, \quad \Pr(i\sim j) = \frac{\nu\sqrt{\pi_i\pi_j}}{\pi_i+\pi_j+\nu\sqrt{\pi_i\pi_j}}\]

Reference

Davidson, R. R. (1970). On extending the Bradley-Terry model to accommodate ties in paired comparison experiments. https://doi.org/10.1080/01621459.1970.10481082

source
Scorio.bradley_terry_davidson_mapFunction
bradley_terry_davidson_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,

)

Rank models with Bradley-Terry-Davidson MAP estimation.

\[(\hat\theta,\hat\nu) = \arg\min_{\theta,\nu>0} \left[-\log p(W,T\mid \theta,\nu) + \operatorname{penalty}(\theta)\right]\]

source
Scorio.rao_kupperFunction
rao_kupper(
    R;
    tie_strength=1.1,
    method="competition",
    return_scores=false,
    max_iter=500,

)

Rank models with the Rao-Kupper tie model (ML).

With fixed $kappa \ge 1$:

\[\Pr(i\succ j)=\frac{\pi_i}{\pi_i+\kappa\pi_j}, \quad \Pr(j\succ i)=\frac{\pi_j}{\kappa\pi_i+\pi_j}\]

\[\Pr(i\sim j)= \frac{(\kappa^2-1)\pi_i\pi_j} {(\pi_i+\kappa\pi_j)(\kappa\pi_i+\pi_j)}\]

Reference

Rao, P. V., & Kupper, L. L. (1967). Ties in paired-comparison experiments: A generalization of the Bradley-Terry model. https://doi.org/10.1080/01621459.1967.10482901

source
Scorio.rao_kupper_mapFunction
rao_kupper_map(
    R;
    tie_strength=1.1,
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,

)

Rank models with the Rao-Kupper tie model under MAP estimation.

\[\hat\theta = \arg\min_{\theta} \left[-\log p(W,T\mid \theta,\kappa) + \operatorname{penalty}(\theta)\right]\]

source

Bayesian Ranking

Scorio.thompsonFunction
thompson(
    R;
    n_samples=10000,
    prior_alpha=1.0,
    prior_beta=1.0,
    seed=42,
    method="competition",
    return_scores=false,
)

Rank models by Thompson sampling over Beta posteriors of model success rates.

The returned score for each model is negative average sampled rank (higher is better).

Let $S_l = \sum_{m,n} R_{lmn}$ and T = M N. Posterior per model:

\[p_l \mid R \sim \mathrm{Beta}(\alpha + S_l,\ \beta + T - S_l)\]

With posterior draws $t=1,\dots,T_s$ and sampled rank $r_l^{(t)}$:

\[s_l^{\mathrm{TS}} = -\frac{1}{T_s}\sum_{t=1}^{T_s} r_l^{(t)}\]

References

Thompson, W. R. (1933). On the Likelihood that One Unknown Probability Exceeds Another in View of the Evidence of Two Samples. Biometrika. https://doi.org/10.1093/biomet/25.3-4.285

Russo, D. J., et al. (2018). A Tutorial on Thompson Sampling. https://doi.org/10.1561/2200000070

source
Scorio.bayesian_mcmcFunction
bayesian_mcmc(
    R;
    n_samples=5000,
    burnin=1000,
    prior_var=1.0,
    seed=42,
    method="competition",
    return_scores=false,
)

Rank models via random-walk Metropolis MCMC under a Bradley-Terry-style pairwise likelihood with Gaussian prior on latent abilities.

Scores are posterior means of sampled latent abilities.

Let $W_{ij}$ be decisive wins of model i over j, and latent log-strengths theta_i with Gaussian prior variance sigma^2 = prior_var.

\[\Pr(i \succ j \mid \theta) = \frac{\exp(\theta_i)}{\exp(\theta_i)+\exp(\theta_j)}, \qquad \theta_i \sim \mathcal{N}(0,\sigma^2)\]

The returned score is the posterior mean:

\[s_i^{\mathrm{MCMC}} = \mathbb{E}[\theta_i \mid W]\]

References

Bradley, R. A., & Terry, M. E. (1952). Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons. Biometrika. https://doi.org/10.1093/biomet/39.3-4.324

Metropolis, N., et al. (1953). Equation of State Calculations by Fast Computing Machines. The Journal of Chemical Physics. https://doi.org/10.1063/1.1699114

source

Item Response Theory

Scorio.raschFunction
rasch(
    R;
    method="competition",
    return_scores=false,
    max_iter=500,
    return_item_params=false,
)

Rank models with Rasch (1PL) maximum-likelihood estimation.

Returns rankings from estimated abilities theta. When return_item_params=true, also returns item difficulties.

For counts $k_{lm}=\sum_n R_{lmn}$:

\[k_{lm} \sim \mathrm{Binomial}\!\left(N,\sigma(\theta_l-b_m)\right)\]

Item difficulties are mean-centered for identifiability:

\[b \leftarrow b - \frac{1}{M}\sum_m b_m\]

Reference

Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainment Tests.

source
Scorio.rasch_mapFunction
rasch_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,
    return_item_params=false,
)

Rank models with Rasch (1PL) MAP estimation using an ability prior.

\[(\hat\theta,\hat b) = \arg\min_{\theta,b} \left[ -\sum_{l,m}\log p(k_{lm}\mid \theta_l,b_m) + \operatorname{penalty}(\theta) \right]\]

Reference

Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika.

source
Scorio.rasch_2plFunction
rasch_2pl(
    R;
    method="competition",
    return_scores=false,
    max_iter=500,
    return_item_params=false,
    reg_discrimination=0.01,
)

Rank models with 2PL IRT maximum likelihood (ability + item discrimination).

\[k_{lm} \sim \mathrm{Binomial}\!\left( N,\sigma\!\left(a_m(\theta_l-b_m)\right)\right)\]

source
Scorio.rasch_2pl_mapFunction
rasch_2pl_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,
    return_item_params=false,
    reg_discrimination=0.01,
)

Rank models with 2PL IRT MAP estimation.

Same 2PL likelihood as rasch_2pl, plus prior regularization on abilities:

\[\hat\theta \in \arg\min_{\theta,\cdots} \left[-\log p(k\mid \theta,\cdots)+\operatorname{penalty}(\theta)\right]\]

source
Scorio.rasch_3plFunction
rasch_3pl(
    R;
    method="competition",
    return_scores=false,
    max_iter=500,
    fix_guessing=nothing,
    return_item_params=false,
    reg_discrimination=0.01,
    reg_guessing=0.1,
    guessing_upper=0.5,
)

Rank models with 3PL IRT maximum likelihood (ability, discrimination, guessing).

\[p_{lm} = c_m + (1-c_m)\sigma\!\left(a_m(\theta_l-b_m)\right)\]

with $c_m \in [0, \text{guessing_upper}]$.

source
Scorio.rasch_3pl_mapFunction
rasch_3pl_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,
    fix_guessing=nothing,
    return_item_params=false,
    reg_discrimination=0.01,
    reg_guessing=0.1,
    guessing_upper=0.5,
)

Rank models with 3PL IRT MAP estimation.

Same 3PL likelihood as rasch_3pl, with prior penalty on abilities:

\[\hat\theta \in \arg\min_{\theta,\cdots} \left[-\log p(k\mid \theta,\cdots)+\operatorname{penalty}(\theta)\right]\]

source
Scorio.dynamic_irtFunction
dynamic_irt(
    R;
    variant="linear",
    method="competition",
    return_scores=false,
    max_iter=500,
    return_item_params=false,
    time_points=nothing,
    score_target="final",
    slope_reg=0.01,
    state_reg=1.0,
    assume_time_axis=false,
)

Rank models with dynamic IRT variants:

  • "linear": static Rasch baseline
  • "growth": linear growth path
  • "state_space": smoothed latent trajectory

Growth variant:

\[\theta_{ln} = \theta_{0,l} + \theta_{1,l} t_n,\qquad P(R_{lmn}=1)=\sigma(\theta_{ln}-b_m)\]

State-space variant:

\[P(R_{lmn}=1)=\sigma(\theta_{ln}-b_m)\]

with smoothness penalty

\[\lambda \sum_{l,n>1} \frac{(\theta_{ln}-\theta_{l,n-1})^2}{t_n-t_{n-1}}\]

References

Verhelst, N. D., & Glas, C. A. (1993). A dynamic generalization of the Rasch model. Psychometrika.

source
Scorio.rasch_mmlFunction
rasch_mml(
    R;
    method="competition",
    return_scores=false,
    max_iter=100,
    em_iter=20,
    n_quadrature=21,
    return_item_params=false,
)

Rank models with Rasch marginal maximum likelihood using EM + quadrature.

Using quadrature nodes $\theta_q$ and weights w_q, posterior mass for model l is:

\[w_{lq} \propto p(k_l\mid \theta_q,b)\,w_q\]

EAP ability score:

\[\hat\theta_l^{\mathrm{EAP}} = \sum_q w_{lq}\theta_q\]

References

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika.

source
Scorio.rasch_mml_credibleFunction
rasch_mml_credible(
    R;
    quantile=0.05,
    method="competition",
    return_scores=false,
    max_iter=100,
    em_iter=20,
    n_quadrature=21,
)

Rank models by posterior ability quantiles from Rasch MML posterior mass.

\[s_l = Q_q(\theta_l \mid R)\]

Lower q (for example 0.05) yields a more conservative ranking.

source

Voting Methods

Scorio.bordaFunction
borda(R; method="competition", return_scores=false)

Rank models with Borda count from per-question model orderings.

Let $k_{lm} = \sum_{n=1}^{N} R_{lmn}$ and $r_{lm}$ be the descending tie-averaged rank of model l on question m (rank 1 is best):

\[s_l^{\mathrm{Borda}} = \sum_{m=1}^{M} (L - r_{lm})\]

Reference

de Borda, J.-C. (1781/1784). Mémoire sur les élections au scrutin.

source
Scorio.copelandFunction
copeland(R; method="competition", return_scores=false)

Rank models by Copeland score over pairwise question-level majorities.

Let $W^{(q)}_{ij}$ be the number of questions where $k_{im} > k_{jm}$:

\[s_i^{\mathrm{Copeland}} = \sum_{j\ne i}\operatorname{sign}\!\left(W^{(q)}_{ij} - W^{(q)}_{ji}\right)\]

source
Scorio.win_rateFunction
win_rate(R; method="competition", return_scores=false)

Rank models by aggregate pairwise win rate.

With the same $W^{(q)}_{ij}$ counts:

\[s_i^{\mathrm{winrate}} = \frac{\sum_{j\ne i} W^{(q)}_{ij}} {\sum_{j\ne i}\left(W^{(q)}_{ij}+W^{(q)}_{ji}\right)}\]

Models with no decisive pairwise outcomes receive score 0.5.

source
Scorio.minimaxFunction
minimax(
    R;
    variant="margin",
    tie_policy="half",
    method="competition",
    return_scores=false,
)

Rank models with Minimax (Simpson-Kramer), using worst defeat strength.

Let $P_{ij}$ be pairwise preference counts and $\Delta_{ij}=P_{ij}-P_{ji}$.

Margin variant:

\[s_i^{\mathrm{minimax}} = -\max_{j\ne i}\max(0,\Delta_{ji})\]

Winning-votes variant:

\[s_i^{\mathrm{wv}} = -\max_{j:\,P_{ji}>P_{ij}} P_{ji}\]

source
Scorio.schulzeFunction
schulze(R; tie_policy="half", method="competition", return_scores=false)

Rank models using the Schulze strongest-path method.

Initialize:

\[p_{ij} = \begin{cases} P_{ij}, & P_{ij}>P_{ji} \\ 0, & \text{otherwise} \end{cases}\]

Then apply strongest-path closure:

\[p_{jk} \leftarrow \max\!\left(p_{jk}, \min(p_{ji}, p_{ik})\right)\]

Model i beats j if $p_{ij} > p_{ji}$.

source
Scorio.ranked_pairsFunction
ranked_pairs(
    R;
    strength="margin",
    tie_policy="half",
    method="competition",
    return_scores=false,
)

Rank models with Ranked Pairs (Tideman) by locking pairwise victories without creating directed cycles.

Victories are sorted by strength (margin or winning-votes), then each edge winner -> loser is locked only if it does not create a directed cycle in the current locked graph.

source
Scorio.kemeny_youngFunction
kemeny_young(
    R;
    tie_policy="half",
    method="competition",
    return_scores=false,
    time_limit=nothing,
    tie_aware=true,
)

Rank models with Kemeny-Young via MILP optimization. With tie_aware=true, the routine analyzes forced pairwise orders among optimal solutions.

Binary variables $y_{ij}$ indicate whether model i is above j:

\[\max_y \sum_{i\ne j} P_{ij} y_{ij}\]

subject to:

\[y_{ij}+y_{ji}=1,\qquad y_{ij}+y_{jk}+y_{ki}\le 2 \quad (\forall i,j,k\ \text{distinct})\]

tie_aware=true checks which pairwise orders are forced across all optimal solutions and ranks by that forced-order DAG.

source
Scorio.nansonFunction
nanson(R; rank_ties="average", method="competition", return_scores=false)

Rank models with Nanson's elimination rule (iterative Borda with below-mean elimination).

At round t, with active set A_t and Borda scores $s_i^{(t)}$:

\[E_t = \{ i\in A_t : s_i^{(t)} < \overline{s}^{(t)} \}, \qquad A_{t+1} = A_t \setminus E_t\]

source
Scorio.baldwinFunction
baldwin(R; rank_ties="average", method="competition", return_scores=false)

Rank models with Baldwin's elimination rule (iterative elimination of minimum Borda score).

At round t:

\[E_t = \arg\min_{i\in A_t} s_i^{(t)}, \qquad A_{t+1} = A_t \setminus E_t\]

This implementation removes all models tied at the minimum in a round.

source
Scorio.majority_judgmentFunction
majority_judgment(R; method="competition", return_scores=false)

Rank models using Majority Judgment with recursive median-grade tie-breaking.

Each question assigns grade $k_{lm}\in\{0,\dots,N\}$. Models are compared by lower median grade; ties are broken by recursively removing one occurrence of the current median grade from tied models and repeating the comparison.

Reference

Balinski, M., & Laraki, R. (2011). Majority Judgment.

source

Graph-based Methods

Scorio.pagerankFunction
pagerank(
    R;
    damping=0.85,
    max_iter=100,
    tol=1e-6,
    method="competition",
    return_scores=false,
    teleport=nothing,
)

Rank models with PageRank on the pairwise win-probability graph.

Let $\hat P_{i\succ j}$ be empirical tied-split win probabilities. Column-normalized transition matrix:

\[P_{ij} = \frac{\hat P_{i\succ j}}{\sum_{k\ne j}\hat P_{k\succ j}}\]

PageRank fixed point:

\[r = d P r + (1-d)e\]

where e is a teleportation distribution (uniform by default).

Reference

Page, L., et al. (1999). The PageRank Citation Ranking.

source
Scorio.spectralFunction
spectral(
    R;
    max_iter=10000,
    tol=1e-12,
    method="competition",
    return_scores=false,
)

Rank models by the dominant eigenvector of a spectral centrality matrix built from pairwise win probabilities.

Construct:

\[W_{ij}=\hat P_{i\succ j}\ (i\ne j), \qquad W_{ii}=\sum_{j\ne i}W_{ij}\]

Score vector is the normalized dominant right eigenvector:

\[v \propto Wv,\qquad \sum_i v_i=1\]

source
Scorio.alpharankFunction
alpharank(
    R;
    alpha=1.0,
    population_size=50,
    max_iter=100000,
    tol=1e-12,
    method="competition",
    return_scores=false,
)

Rank models with single-population alpha-Rank stationary distribution scores.

For resident s, mutant r, population size m:

\[u = \alpha\frac{m}{m-1}\left(\hat P_{r\succ s}-\frac12\right),\qquad \rho_{r,s}= \begin{cases} \frac{1-e^{-u}}{1-e^{-mu}}, & u\ne 0\\ \frac{1}{m}, & u=0 \end{cases}\]

Transition matrix:

\[C_{sr}=\frac{1}{L-1}\rho_{r,s},\qquad C_{ss}=1-\sum_{r\ne s}C_{sr}\]

Ranking uses the stationary distribution of C.

Reference

Omidshafiei, S., et al. (2019). α-Rank: Multi-Agent Evaluation by Evolution. Scientific Reports.

source
Scorio.nashFunction
nash(
    R;
    n_iter=100,
    temperature=0.1,
    solver="lp",
    score_type="vs_equilibrium",
    return_equilibrium=false,
    method="competition",
    return_scores=false,
)

Rank models from a Nash-equilibrium mixture of the zero-sum meta-game induced by pairwise win probabilities.

Payoff matrix:

\[A_{ij}=2\hat P_{i\succ j}-1,\qquad A_{ii}=0\]

Equilibrium mixture x is found by LP:

\[\max_{x\in\Delta^{L-1}} v \quad\text{s.t.}\quad \sum_i A_{ij}x_i \ge v,\ \forall j\]

Default score type ("vs_equilibrium") is:

\[s_i = \sum_j \hat P_{i\succ j} x_j\]

source

Centrality and Spectral Variants

Scorio.rank_centralityFunction
rank_centrality(
    R;
    method="competition",
    return_scores=false,
    tie_handling="half",
    smoothing=0.0,
    teleport=0.0,
    max_iter=10000,
    tol=1e-12,
)

Rank models with Rank Centrality using stationary distribution of a pairwise-comparison Markov chain.

Let d_max be the maximum degree of the undirected comparison graph and $\hat P_{j\succ i}$ the empirical probability that j beats i. For $i \ne j$:

\[P_{ij} = \frac{1}{d_{\max}}\,\hat P_{j\succ i}, \qquad P_{ii} = 1 - \sum_{j\ne i} P_{ij}\]

Scores are stationary probabilities $\pi$ with:

\[\pi^\top P = \pi^\top,\qquad \sum_i \pi_i = 1\]

Reference

Negahban, S., Oh, S., & Shah, D. (2017). Rank Centrality: Ranking from Pairwise Comparisons. Operations Research.

source
Scorio.serial_rankFunction
serial_rank(R; comparison="prob_diff", method="competition", return_scores=false)

Rank models with SerialRank spectral seriation using a Fiedler-vector ordering from comparison-induced similarity.

With pairwise comparison matrix C (skew-symmetric), SerialRank builds:

\[S = \frac{1}{2}\left(L\mathbf{1}\mathbf{1}^{\top} + C C^{\top}\right), \qquad L_S = \operatorname{diag}(S\mathbf{1}) - S\]

Scores are the oriented Fiedler vector of L_S (eigenvector of the second-smallest eigenvalue), with sign chosen to best match observed pairwise directions.

Reference

Fogel, F., d'Aspremont, A., & Vojnovic, M. (2016). Spectral Ranking Using Seriation. JMLR.

source
Scorio.hodge_rankFunction
hodge_rank(
    R;
    pairwise_stat="binary",
    weight_method="total",
    epsilon=0.5,
    method="competition",
    return_scores=false,
    return_diagnostics=false,
)

Rank models with l2 HodgeRank on a weighted pairwise-comparison graph.

Let $Y_{ij}$ be a skew-symmetric observed pairwise flow and $w_{ij}\ge 0$ edge weights. HodgeRank solves:

\[s^\star \in \arg\min_s \sum_{i<j} w_{ij}\left((s_j-s_i)-Y_{ij}\right)^2\]

Equivalent normal equations:

\[\Delta_0 s^\star = -\operatorname{div}(Y), \qquad s^\star = -\Delta_0^\dagger \operatorname{div}(Y)\]

where $\Delta_0^\dagger$ is the Laplacian pseudoinverse.

Reference

Jiang, X., Lim, L.-H., Yao, Y., & Ye, Y. (2009). Statistical Ranking and Combinatorial Hodge Theory. https://arxiv.org/abs/0811.1067

source

Listwise and Choice Models

Scorio.plackett_luceFunction
plackett_luce(
    R;
    method="competition",
    return_scores=false,
    max_iter=500,
    tol=1e-8,
)

Rank models with Plackett-Luce ML on decisive pairwise-reduced outcomes.

This implementation uses pairwise decisive counts and Hunter's MM update:

\[\pi_i^{(k+1)} = \frac{\sum_j W_{ij}} {\sum_{j\ne i}(W_{ij}+W_{ji})/(\pi_i^{(k)}+\pi_j^{(k)})}\]

followed by normalization of $\pi$.

References

Plackett, R. L. (1975). The Analysis of Permutations. Hunter, D. R. (2004). MM algorithms for generalized Bradley-Terry models.

source
Scorio.plackett_luce_mapFunction
plackett_luce_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,
)

Rank models with Plackett-Luce MAP using a prior on centered log-strengths.

With $theta_i = \log \pi_i$:

\[\hat\theta \in \arg\min_\theta \left[ -\sum_{i\ne j}W_{ij}\left(\theta_i-\log(e^{\theta_i}+e^{\theta_j})\right) + \operatorname{penalty}(\theta-\bar\theta) \right]\]

source
Scorio.davidson_luceFunction
davidson_luce(
    R;
    method="competition",
    return_scores=false,
    max_iter=500,
    max_tie_order=nothing,

)

Rank models with Davidson-Luce setwise tie likelihood (ML).

For event comparison set $S=W\cup L$, winner set size t=|W|, $g_t(W)=\left(\prod_{i\in W}\alpha_i\right)^{1/t}$, and tie-order parameters delta_t:

\[\Pr(W\mid S)= \frac{\delta_t g_t(W)} {\sum_{t'=1}^{\min(D,|S|)}\delta_{t'} \sum_{|T|=t'} g_{t'}(T)}\]

Reference

Firth, D., Kosmidis, I., & Turner, H. L. (2019). Davidson-Luce model for multi-item choice with ties.

source
Scorio.davidson_luce_mapFunction
davidson_luce_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,
    max_tie_order=nothing,

)

Rank models with Davidson-Luce MAP estimation.

\[\hat\theta \in \arg\min_\theta \left[-\log p(\text{events}\mid\theta,\delta) + \operatorname{penalty}(\theta)\right]\]

source
Scorio.bradley_terry_luceFunction
bradley_terry_luce(
    R;
    method="competition",
    return_scores=false,
    max_iter=500,
)

Rank models with Bradley-Terry-Luce composite-likelihood ML from setwise winner/loser events.

For each event (W,L), each winner $i\in W$ is treated as a Luce choice from ${i}\cup L$, yielding composite log-likelihood:

\[\ell_{\mathrm{comp}}(\pi) = \sum_{(W,L)}\sum_{i\in W} \left[ \log\pi_i - \log\!\left(\pi_i+\sum_{j\in L}\pi_j\right) \right]\]

source
Scorio.bradley_terry_luce_mapFunction
bradley_terry_luce_map(
    R;
    prior=1.0,
    method="competition",
    return_scores=false,
    max_iter=500,

)

Rank models with Bradley-Terry-Luce composite-likelihood MAP estimation.

\[\hat\theta \in \arg\min_\theta \left[-\ell_{\mathrm{comp}}(\theta)+\operatorname{penalty}(\theta)\right]\]

source