Cohen’s kappa — Worked example

Goal: quantify agreement between two raters beyond chance.

Example data

Rater 1: yes no yes yes no yes no yes
Rater 2: yes no no  yes no yes no yes

Step-by-step

Build confusion table of R1 vs R2 counts.
Observed agreement Po = (matches) / n.
Expected agreement Pe = Σ (row_marg·col_marg) / n².
Kappa κ = (Po − Pe) / (1 − Pe).
Interpretation (rule-of-thumb): 0–0.2 slight, 0.21–0.4 fair, 0.41–0.6 moderate, 0.61–0.8 substantial, 0.81–1 almost perfect.

Intermediate results (illustration)

n = 8
Matches: 7 → Po = 7/8 = 0.875
Marginals: R1 yes=5 no=3; R2 yes=5 no=3
Pe = (5·5 + 3·3)/8² = (25+9)/64 = 0.53125
κ = (0.875−0.53125)/(1−0.53125) ≈ 0.733

Try it in the tool

Open with these labels

FAQ

When should I use kappa?

Use it for categorical labels from two raters (or two classification methods) on the same set of items.

What if categories are imbalanced?

Kappa can be sensitive to prevalence and marginal distributions. Consider reporting the confusion table too.

Is weighted kappa supported?

This page implements unweighted Cohen’s kappa. Weighted kappa is used for ordered categories and can be added later.