Surprisal Scoring
Measuring unexpected semantic transitions
Abstract
Methodology for measuring surprisal—how unexpected a semantic transition is relative to reference networks. Uses negative log-probability against Small World of Words (SWOW) and LLM World of Words (LWOW) association norms. Distinguishes between typical (predictable) and atypical (creative) associations.
Overview
This study is currently calibrating. Content will document:
- Reference networks — Small World of Words (SWOW) and LLM World of Words (LWOW) association databases
- Transition probability — computing association strength from frequency data
- Surprisal calculation — negative log-probability of observed transitions
- Interpretation — distinguishing creative (high surprisal, high relevance) from random (high surprisal, low relevance)
Definition
Where is the conditional probability of associating given , estimated from reference network frequencies.
Reference Networks
| Network | Source | Coverage |
|---|---|---|
| SWOW-EN | De Deyne et al. (2019) | Human association norms, ~12K cues |
| LWOW | Abramski et al. (2025) | LLM association norms for comparison |
Theoretical Foundation
Surprisal distinguishes:
- Low surprisal + high relevance = conventional, expected associations
- High surprisal + high relevance = creative, unexpected but connected
- High surprisal + low relevance = random, noise
This methodology is under development. Check back for updates.