Examples
Bayesian Evaluation
The bayes function implements the Bayes@N framework. It takes a matrix of outcomes R, a weight vector w, and optionally a matrix of prior outcomes R0.
using Scorio
# Outcomes R: shape (M, N) with integer categories in {0, ..., C}
R = [0 1 2 2 1;
1 1 0 2 2]
# Rubric weights w: length C+1
# Here: 0=incorrect(0.0), 1=partial(0.5), 2=correct(1.0)
w = [0.0, 0.5, 1.0]
# Optional prior outcomes R0: shape (M, D)
R0 = [0 2;
1 2]
# Bayesian evaluation with prior
mu, sigma = bayes(R, w, R0)
println("μ = $mu, σ = $sigma")Pass@k Evaluation
For binary outcomes (correct/incorrect), you can use pass_at_k and pass_hat_k.
using Scorio
# Binary outcomes R: shape (M, N)
R = [0 1 1 0 1;
1 1 0 1 1]
# Pass@k: Probability that at least one of k samples is correct
pk = pass_at_k(R, 1)
println("Pass@1: $pk")
# Pass^k (Pass-hat@k): Probability that all k samples are correct
phk = pass_hat_k(R, 2)
println("Pass^2: $phk")Ranking
You can compute competition ranks from scores using competition_ranks_from_scores.
using Scorio
scores = [0.95, 0.87, 0.87, 0.72, 0.65]
ranks = competition_ranks_from_scores(scores)
# Returns: [1, 2, 2, 4, 5]
println(ranks)Ranking APIs (Response Tensor)
Ranking methods operate on a response tensor R with shape (L, M, N):
L: number of modelsM: number of questionsN: number of trials per question
using Scorio
# R[l, m, n] = 1 if model l solves question m on trial n, else 0
R = [
1 1 0 1 0;
1 0 0 1 0;
0 1 0 1 1;
0 0 0 1 0
]
# Promote to (L, M, N=1) automatically when needed.
# For multi-trial ranking, pass a 3D tensor explicitly:
R3 = reshape(R, 4, 5, 1)Eval-Metric Ranking
using Scorio
# Mean-accuracy ranking
ranks_mean, scores_mean = Scorio.mean(R3; return_scores=true)
println("mean ranks = ", ranks_mean)
println("mean scores = ", scores_mean)
# Stability-style ranking with generalized pass metric
ranks_gpass, scores_gpass = g_pass_at_k_tau(R3, 1, 1.0; return_scores=true)
println("g_pass ranks = ", ranks_gpass)
println("g_pass scores = ", scores_gpass)Pairwise Rating Methods
using Scorio
ranks_elo, ratings_elo = elo(R3; K=16.0, return_scores=true)
println("Elo ranks = ", ranks_elo)
println("Elo ratings = ", ratings_elo)
ranks_ts, scores_ts = trueskill(R3; return_scores=true)
println("TrueSkill ranks = ", ranks_ts)
println("TrueSkill mu = ", scores_ts)
ranks_glicko, ratings_glicko, rd_glicko = glicko(R3; return_deviation=true)
println("Glicko ranks = ", ranks_glicko)
println("Glicko ratings = ", ratings_glicko)
println("Glicko RD = ", rd_glicko)Bradley-Terry Family
using Scorio
ranks_bt, scores_bt = bradley_terry(R3; return_scores=true)
println("BT ranks = ", ranks_bt)
println("BT scores = ", scores_bt)
ranks_bt_map, scores_bt_map = bradley_terry_map(R3; prior=1.0, return_scores=true)
println("BT-MAP ranks = ", ranks_bt_map)
println("BT-MAP scores = ", scores_bt_map)Voting-Based Ranking
using Scorio
ranks_borda, scores_borda = borda(R3; return_scores=true)
println("Borda ranks = ", ranks_borda)
println("Borda scores = ", scores_borda)
ranks_copeland, scores_copeland = copeland(R3; return_scores=true)
println("Copeland ranks = ", ranks_copeland)
println("Copeland scores = ", scores_copeland)IRT-Based Ranking
using Scorio
ranks_rasch, scores_rasch = rasch(R3; return_scores=true)
println("Rasch ranks = ", ranks_rasch)
println("Rasch theta = ", scores_rasch)
ranks_2pl, scores_2pl = rasch_2pl(R3; return_scores=true)
println("2PL ranks = ", ranks_2pl)
println("2PL theta = ", scores_2pl)Graph and Listwise Ranking
using Scorio
ranks_pr, scores_pr = pagerank(R3; return_scores=true)
println("PageRank ranks = ", ranks_pr)
println("PageRank scores = ", scores_pr)
ranks_rc, scores_rc = rank_centrality(R3; return_scores=true)
println("RankCentrality ranks = ", ranks_rc)
println("RankCentrality scores = ", scores_rc)
ranks_pl, scores_pl = plackett_luce(R3; return_scores=true)
println("Plackett-Luce ranks = ", ranks_pl)
println("Plackett-Luce scores = ", scores_pl)