QI as a dataset for the study of knowledge itself
~300 episodes. 20+ years. Thousands of factual claims. Two hosts. A rotating panel of comedians and experts. All against the backdrop of massive changes in how information is produced and consumed. This is a research instrument.
Methodology Note
All research dimensions are designed to produce SymSys-grade, peer-review-quality insights. The transcript processing pipeline captures data for all dimensions from the start — following the principle of maximum extraction, deferred interpretation. The cost of extracting a data point you don't use is nearly zero. The cost of realizing 200 episodes in that you needed something you didn't capture is enormous.
Taxonomy of Epistemic Failure Modes
“Why do people believe wrong things, and are the reasons classifiable?”
Every incorrect or outdated claim is tagged with a failure mode from a developing taxonomy — from compression loss to folk etymology to cultural specificity misread as universality.
SymSys Connections
Treemap / sunburst filling in as episodes are processed
Knowledge Half-Life by Domain
“How fast do different categories of knowledge become obsolete?”
Testing Arbesman's Half-Life of Facts thesis with a novel methodology — testing factual claims against current consensus rather than using citation data.
SymSys Connections
Kaplan-Meier survival curves by domain
The Internet Effect
“Has the nature of what people get wrong changed as the information environment changed?”
QI premiered in 2003. The show's run maps onto Wikipedia, social media, smartphones, and AI-generated content. Tracking shifts in misconception types across information eras.
SymSys Connections
Timeline showing misconception type composition shifting across eras
The Landscape of Informal Knowledge
“What predicts who knows what, beyond formal credentials?”
Mapping where panelists' accuracy surprises — when comedians outperform experts, when personal background trumps education.
SymSys Connections
Panelist x domain heatmap filling in over time
The Fry-Toksvig Transition
“How does a moderator shape collective knowledge production?”
A natural experiment: comparing how the same show functions under two different hosts across multiple dimensions.
SymSys Connections
Split-screen dashboard comparing both eras
The Confidence-Accuracy Relationship
“Are panelists who sound more confident actually more accurate?”
Testing the Dunning-Kruger effect in a naturalistic setting with calibration plots per panelist.
SymSys Connections
Calibration plots (confidence vs. accuracy) per panelist
The 'Nobody Knows' Category
“What characterizes questions that genuinely have no answer?”
Cataloguing the boundaries of human knowledge — the things we don't know, categorized.
SymSys Connections
A growing inventory of the things we don't know, categorized
Emerging Research
New research questions are added as patterns emerge from the data. The extraction pipeline is designed for maximum granularity so that unforeseen research directions are supported by data already collected.