MTH-002.3 Semantic Association Metrics
Researching
v0.1 January 15, 2026

Surprisal Scoring

Measuring unexpected semantic transitions

Abstract

Methodology for measuring surprisal—how unexpected a semantic transition is relative to reference networks. Uses negative log-probability against Small World of Words (SWOW) and LLM World of Words (LWOW) association norms. Distinguishes between typical (predictable) and atypical (creative) associations.

Overview

This study is currently calibrating. Content will document:

  1. Reference networks — Small World of Words (SWOW) and LLM World of Words (LWOW) association databases
  2. Transition probability — computing association strength from frequency data
  3. Surprisal calculation — negative log-probability of observed transitions
  4. Interpretation — distinguishing creative (high surprisal, high relevance) from random (high surprisal, low relevance)

Definition

surprisal(w1w2)=logP(w2w1)\text{surprisal}(w_1 \to w_2) = -\log P(w_2 | w_1)

Where P(w2w1)P(w_2 | w_1) is the conditional probability of associating w2w_2 given w1w_1, estimated from reference network frequencies.

Reference Networks

NetworkSourceCoverage
SWOW-ENDe Deyne et al. (2019)Human association norms, ~12K cues
LWOWAbramski et al. (2025)LLM association norms for comparison

Theoretical Foundation

Surprisal distinguishes:

  • Low surprisal + high relevance = conventional, expected associations
  • High surprisal + high relevance = creative, unexpected but connected
  • High surprisal + low relevance = random, noise

This methodology is under development. Check back for updates.