MTH-001 Methodology
Published
v1.2 December 29, 2025

Observational Chat Analysis

Framework for analyzing real-world human-AI conversations at scale

Abstract

A methodology for analyzing naturalistic human-AI conversations to measure engagement patterns, content dynamics, and user behavior. Developed using the WildChat dataset (N=4.7M conversations, 2.4M users) with validated classifiers and reproducible pipelines. This framework supports studies of protest behavior, user persistence, engagement prediction, semantic exploration, model upgrade impacts, and concerning usage patterns.

Dataset
WildChat-4.8M
Source: Allen AI Institute
Size: 4,743,336 conversations
Period: April 2023 – July 2025
View Dataset →
Studies in This Family

Assumptions

This methodology rests on several assumptions that should be considered when interpreting results:

  1. User identification: The composite identifier (hashed IP + user agent) reliably distinguishes unique users. This may undercount users on shared networks or overcount users who switch browsers.

  2. Toxicity labels: The WildChat toxicity flags accurately reflect harmful content. The moderation algorithms used have known biases toward certain content types.

  3. Protest semantics: Linguistic patterns (“I cannot”, “I apologize but”) reliably indicate AI refusal behavior. Edge cases (e.g., roleplay scenarios, quoted text) may be misclassified.

  4. Conversation boundaries: Each conversation_hash represents a single coherent interaction. Users cannot manipulate conversation boundaries.

Dataset

Overview

The WildChat dataset provides a large-scale corpus of naturalistic human-AI conversations collected from an anonymous ChatGPT interface. Key characteristics:

MetricValue
Total conversations4,743,336
Unique users2,462,800
Collection periodApril 2023 – July 2025
Models representedGPT-3.5, GPT-4, GPT-4o

User Identification

Users are identified through a composite identifier combining browser fingerprinting signals:

user_id = hashed_ip + "|" + user_agent

This approach provides more reliable identification than IP alone, distinguishing between different browsers or devices on the same network.

Toxicity Classification

Conversations include toxicity labels from Detoxify, applied during dataset preprocessing. The openai_moderation field provides per-turn scores across six categories: harassment, hate, illicit, self-harm, sexual, and violence.

Shared Infrastructure

Protest Classifier

A two-stage pipeline for identifying AI refusal behavior:

  1. Feature Extraction: TF-IDF vectorization (5,000 features, unigrams + bigrams)
  2. Classification: Logistic regression with L2 regularization

Trained via active learning with 220 hand-labeled examples. Performance at threshold 0.4:

  • ROC AUC: 0.976
  • F1 Score: 0.900
  • Precision: 0.935
  • Recall: 0.867

Session Construction

Conversations are grouped into sessions using a 30-minute gap threshold:

SESSION_GAP_MINUTES = 30

# Conversations within 30 minutes of each other 
# belong to the same session

Embedding Pipeline

Sentence embeddings for semantic analysis use all-MiniLM-L6-v2 (384 dimensions) with L2 normalization for cosine similarity calculations.

Studies in This Family

StudyTitleFocus
MTH-001.1Protest Behavior AnalysisDetecting and analyzing AI refusal patterns
MTH-001.2Engagement Prediction from First-Turn FeaturesPredicting user return from first prompt characteristics
MTH-001.3Semantic Exploration and Sustained UtilizationRelationship between topic diversity and engagement
MTH-001.4Model Upgrade Impact on User EngagementNatural experiment analyzing capability improvements
MTH-001.5Characterizing Concerning Usage SessionsDisaggregating extended engagement patterns

Limitations

Framework-level limitations that apply to all studies:

LimitationImpactMitigation
Selection biasWildChat users self-selected into an anonymous serviceDocument population characteristics; limit generalization claims
Temporal confoundsModel behavior changed across collection periodStratify by time period; note policy updates
No ground truth for intentCannot observe user motivationsInterpret patterns, not causation
Platform-specificInterface differs from authenticated platformsNote WildChat-specific context

Changelog

VersionDateChanges
1.22026-01-05Added MTH-001.1 Protest Behavior Analysis; renumbered existing studies to MTH-001.2–MTH-001.5
1.12026-01-05Added studies MTH-001.1–MTH-001.4; restructured as family overview
1.02025-12-29Initial publication