bayes-kit

Bayes-Kit

arXiv License: MIT Python 3.9+ Julia 1.6+


📦 Packages

This repository contains two packages:

  1. scorio (Python) - Python implementation
  2. Scorio.jl (Julia) - Julia implementation

🚀 Quick Start

Python (scorio)

Installation

# Install from PyPI
pip install scorio

# Install from repository
pip install -e .

Basic Usage

import numpy as np
from scorio import eval

# Outcomes R: shape (M, N) with integer categories in {0, ..., C}
R = np.array([[0, 1, 2, 2, 1],
              [1, 1, 0, 2, 2]])

# Rubric weights w: length C+1
# Here: 0=incorrect(0.0), 1=partial(0.5), 2=correct(1.0)
w = np.array([0.0, 0.5, 1.0])

# Optional prior outcomes R0: shape (M, D)
R0 = np.array([[0, 2],
               [1, 2]])

# Bayesian evaluation with prior
mu, sigma = eval.bayes(R, w, R0)
print(f"μ = {mu:.6f}, σ = {sigma:.6f}")
# Expected: μ ≈ 0.575, σ ≈ 0.084275

# Bayesian evaluation without prior
mu2, sigma2 = eval.bayes(R, w)
print(f"μ = {mu2:.6f}, σ = {sigma2:.6f}")
# Expected: μ ≈ 0.5625, σ ≈ 0.091998

# Simple average
accuracy = eval.avg(R)
print(f"Average: {accuracy:.6f}")

Julia (Scorio.jl)

Installation

using Pkg

# From local development
Pkg.develop(path="./julia/Scorio.jl")

# Or from Julia General Registry (comming soon)
# Pkg.add("Scorio")

Basic Usage

using Scorio

# Outcomes R: shape (M, N) with integer categories in {0, ..., C}
R = [0 1 2 2 1;
     1 1 0 2 2]

# Rubric weights w: length C+1
# Here: 0=incorrect(0.0), 1=partial(0.5), 2=correct(1.0)
w = [0.0, 0.5, 1.0]

# Optional prior outcomes R0: shape (M, D)
R0 = [0 2;
      1 2]

# Bayesian evaluation with prior
mu, sigma = bayes(R, w, R0)
println("μ = $mu, σ = $sigma")
# Expected: μ ≈ 0.575, σ ≈ 0.084275

# Bayesian evaluation without prior
mu2, sigma2 = bayes(R, w)
println("μ = $mu2, σ = $sigma2")
# Expected: μ ≈ 0.5625, σ ≈ 0.091998

# Simple average
accuracy = avg(R)
println("Average: $accuracy")

Evaluation Functions

bayes(R, w, R0=None)

Bayesian performance evaluation with uncertainty quantification using the Bayes@N framework.

Data and Shape Conventions


📝 Requirements

Python

Julia


📚 Documentation

Full documentation is available at: https://mohsenhariri.github.io/bayes-kit/


📄 Citation

If you use Bayes-Kit in your research, please cite:

@article{bayeskit2025,
  title={Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation},
  author={Hariri, Mohsen and Samandar, Amirhossein},
  journal={arXiv preprint arXiv:2504.11651},
  year={2025}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.