Hide table of contents

A Practitioner-Accessible Error Taxonomy for the Missing Layer of AI Safety Classification

 

Chance Beyer, written with Claude (Anthropic Opus 4.6)

April 2026

Abridged Version


 

 

A Practitioner-Accessible Error Taxonomy for the Missing Layer of AI Safety Classification

Author: Chance Beyer, written with Claude (Anthropic Opus 4.6) Date: April 2026 Version: Abridged — approximately 4,500 words

**This is a condensed version of a longer paper.** The full ~15,000-word version — with extended methodology, full landscape analysis, all worked examples, literature review, and references — is available here: **(https://forum.effectivealtruism.org/posts/eEc9vwEdN8uH8eh8b/toward-a-common-language-for-human-ai-interaction-failures)**. Readers who want the short form stay here; readers who want the empirical and citation scaffolding should go there.

 

TL;DR

The AI development community has extensive failure taxonomies — Microsoft’s agentic taxonomy, MAST, NIST’s governance framework, HELM, BIG-bench. They serve the people who build AI. None of them serve the people who use it. There is no shared language between the end-using prosumer who experiences an AI failure and the developer who could act on it.

This paper proposes a 22-pattern taxonomy of human-AI interaction failures, derived empirically from 660+ hours of sustained collaboration across 80+ sessions, classified by underlying logic rather than by symptom. The taxonomy fills the interaction layer — the missing floor in a three-layer model of AI failure classification (governance → architecture → interaction).

Three patterns (Prior Decay, Structural Momentum, Retrospective Coherence Bias) are invisible to every institutional framework because they only appear in sustained collaboration — no snapshot evaluation will ever catch them. One — Retrospective Coherence Bias — carries implications beyond its own classification: it reveals that the standard methodology for studying AI failures (asking the AI to analyze what went wrong) is itself subject to an unclassified failure mode.

The taxonomy is infrastructure, not theory. It does not itself make AI safer, more accurate, or more reliable. It gives practitioners, students, developers, and researchers a shared vocabulary for describing how AI fails during interaction — so error reports route to the right engineering team instead of disappearing into the catch-all of “hallucination.”

 

How This Started

I needed AI for legal research across several related lawsuits. I started with Claude, aware that large language models fabricate citations, fail at simple math, and lose track of earlier instructions. I had no firsthand experience of any of these problems. I just knew to watch for some, and watched for others as they appeared.

Approaching the project conversationally, I could often watch Claude start down the path of a mistake in real time. Initially I was just making corrections as they arose. But I started seeing patterns — the same kinds of mistakes, producing the same kinds of wrong results, in predictable circumstances. So I started naming them. Naming the mistake helped me become a better user.

Over 80+ sessions, we built an extensive taxonomy of errors — not from theory, but from the accumulated practical record of what actually went wrong and why. Understanding errors this way made it easier to communicate intentions to Claude, which meant we were less likely to produce new errors. The taxonomy wasn’t an academic exercise. It was a survival tool for a project where mistakes had real consequences.

About 700 hours in, I finally looked around at what already existed: extensive libraries accessible to AI development professionals, and nothing giving the user a vocabulary to communicate to the developer. Nothing for a teacher to use with a student. The one catch-all term — “hallucination” — has become so broad it tells you nothing about what kind of wrong, why it happened, or what to do about it.

 

The Problem: Three Audiences, No Shared Language

Practitioners and prosumers describe AI failures in domain-specific or colloquial terms: “it made something up,” “it forgot what I told it,” “it kept giving me the same wrong answer.” These descriptions are accurate but unclassifiable — they cannot be aggregated, compared across domains, or translated into engineering action.

Students are forming AI interaction habits now that they will carry into professional practice. They need a taxonomy that works like a field guide: broad categories identifying the general type of error, with pathways to domain-specialized documentation as they enter their fields. Existing taxonomies start from system architecture (agentic pipelines, multi-agent systems, RAG); students don’t know or care about system architecture.

Developers need practitioner-reported failure data organized in categories they can act on. “The AI hallucinated” is almost useless. “Pattern I (Interpolation Error) — architectural: the model generated plausible content to bridge a gap in its actual knowledge, triggered when the user asked about [specific context]” tells the developer exactly where to look.

 

The Three-Layer Model

Preliminary crosswalks of our taxonomy against NIST AI 600-1 and Microsoft’s Agentic AI Failure Taxonomy reveal that existing frameworks are not inadequate — they are incomplete. Each operates at a different layer, serving a different audience.

LayerFrameworkWhat It ClassifiesWho Uses It
GovernanceNIST AI 600-1Institutional risks to manageCISOs, policy teams, regulators
ArchitectureMicrosoft Agentic AI, MASTSystem-level failure modesSecurity engineers, ML engineers, red teams
InteractionProposed herePractitioner-recognizable logic patternsStudents, prosumers, practitioners, QA teams

The governance layer tells an institution what could go wrong. The architecture layer tells an engineering team where it will fail. The interaction layer tells a practitioner why it just failed and what to do about it. The gap between architecture and interaction is where most actual AI users live — and it is currently unserved.

 

Convergent Evidence: the Hallucination Problem

NIST’s “Confabulation” category and Microsoft’s “Hallucinations” category — developed independently, by different teams, for different purposes — both collapse the same six distinct logic patterns into a single bin:

Our PatternWhat It Actually Is
A — Citation DriftAccuracy degrades as output length increases (a fatigue pattern)
C — Confidence CalibrationUniform confidence regardless of actual certainty (a signaling failure)
G — Completeness IllusionPartial analysis presented as comprehensive (a scope failure)
I — Interpolation ErrorGap-filling with plausible fabrication (the “classic” hallucination mechanism)
R — Retrieval ContaminationWrong training-data associations imported
S — Verification-Induced FabricationConfirms rather than rechecks when asked to verify

Each has a different cause, a different user-recognizable signature, and a different appropriate response. Telling a practitioner “the AI hallucinated” is like telling a patient “you’re sick.” The treatment for Interpolation Error (provide more source material) is counterproductive for Retrieval Contamination (the AI already has too much source material pulling it in wrong directions).

Two independent institutional frameworks making the same collapsing error from different starting points (NIST from governance, Microsoft from security engineering) is not coincidence. It is structural evidence that the practitioner level of classification does not exist in how institutions think about AI failure.

 

The 22 Patterns

Every pattern was discovered through real collaboration, not hypothesized. Each is tagged with a cause type that tells developers where the fix lives — training data, architecture, context management, design priorities, or emergent interaction dynamics.

IDPatternCause TypeLogic Pattern
ACitation DriftTraining artifactAccuracy on specific details degrades as output lengthens — like a student getting sloppier the longer the exam
BAnchor BiasTraining artifactOver-weights whatever it encountered first; resists updating
CConfidence CalibrationArchitecturalExpresses the same confidence whether right or guessing
DJurisdiction DefaultTraining artifactReverts to whatever jurisdiction/framework it was trained on most when domain context fades
ECategory ConflationArchitecturalTreats related-but-distinct concepts as interchangeable
FFraming PersistenceDesign tensionAdopts your framing even when wrong, because helpfulness training rewards agreement
GCompleteness IllusionTraining artifactPresents partial analysis as if comprehensive; no flag for the gap
HPre-Existing Work ImmunityEmergentContent it generated earlier becomes resistant to updating
IInterpolation ErrorArchitecturalFills knowledge gaps with plausible fabrication — the classic “hallucination”
JStructural MomentumEmergentMaintains a document’s structure even when content changes should trigger restructuring
KCross-Reference FailureArchitecturalContradicts itself across sections, documents, or sessions
LAuthority GradientDesign tensionDefers to apparent expertise in training data over its own analysis
MStandardization BlindnessTraining artifactApplies a generic template where the situation requires domain-specific treatment
NNovel PatternEmergentError that doesn’t fit existing categories — signals the taxonomy needs extension
OOmission Under ComplexityArchitecturalDrops elements when task complexity exceeds processing capacity
PPrior DecayContext-dependentConstraints established earlier gradually lose hold as the conversation grows
QQuantitative ReasoningArchitecturalMathematical/numerical errors a calculator would catch
RRetrieval ContaminationTraining artifactImports training-data associations that don’t apply here
SVerification-Induced FabricationTraining artifactWhen asked to verify its own work, confirms rather than rechecks
TStep RepetitionTraining artifact / Context-dependentRepeats the same error across sessions even after correction
UReasoning-Action MismatchDesign tensionStated understanding doesn’t match behavior — either excessive initiative or conversational agreement without action
VCapability AmnesiaContext-dependentLoses awareness of tools it has already used successfully

The Five Cause Types

  • Training artifact: learned something from training data that produces errors in this context → fix in training/fine-tuning/RLHF
  • Architectural: model architecture produces this under certain conditions → fix in architecture, attention, context handling
  • Context-dependent: emerges from dynamics of extended interaction → fix in context-window management, session architecture
  • Design tension: two desirable model behaviors conflict → fix requires a design decision about priorities
  • Emergent: appears only in specific interaction conditions; may not be predictable from individual capabilities → fix in interaction design, monitoring, HITL architecture

The cause-type classification is what makes the taxonomy actionable for developers. A bug report classified as “Pattern I (Interpolation Error) — architectural” tells the engineering team exactly where to look.

 

Three Patterns No Institutional Framework Catches

Three patterns are unrepresented in both NIST and Microsoft:

  • J — Structural Momentum: only observable across multiple revision cycles in sustained collaboration.
  • P — Prior Decay: only observable in extended, multi-session interaction — no snapshot evaluation would surface it.
  • Retrospective Coherence Bias (see worked example below): the AI constructs backward-from-outcome rationalizations for errors, evading the kind of evaluation researchers typically run.

The common thread: all three require longitudinal observation of sustained human-AI collaboration. No institutional framework catches them because no institutional framework is designed to observe longitudinal collaborative interaction.

Prior Decay is arguably the most consequential for professional and prosumer use. The degradation of AI constraint fidelity over extended interaction is currently unnamed — and therefore unmeasured — in the published literature.

 

Landscape Summary

A compressed version of the related-work analysis. Full crosswalks and methodology comparisons are in the long-form paper.

FrameworkAudienceUnit of analysisRelationship to this taxonomy
Microsoft Agentic AI (2025)Security engineers, red teamsFailure mode in agentic architecturesCollapses 6 of our patterns into one “hallucinations” bin; no interaction-layer coverage
MAST (Cemri et al., 2025)ML researchersAgent-to-agent coordination failureClosest methodology (1,642 traces, κ=0.88); different target (agent↔agent, not human↔AI) — we borrow patterns T and U from it
PreFlect (Wang et al., 2026)Agent-framework buildersPlan-checking patternsValidates the taxonomy-from-trajectories methodology; 17%/13% benchmark gains — automated/constrained, vs. our human-facing/unbounded
Agentic AI Fault Taxonomy (Shah et al., 2026)Software engineers37 architectural faultsArchitectural location, not logic pattern
System-Level Taxonomy (Vinay, 2025)LLM app developers15 system failure modesSplits our unified Prior Decay into 3 engineering sub-types; we unify for practitioner response
NIST AI RMF / 600-1Institutions, regulatorsRisk categoriesGovernance layer; “Confabulation” collapses 6 patterns like Microsoft’s
HELM / BIG-benchResearchersCapability benchmarksEvaluate what AI can do, not how it fails during interaction
CaSE (Do et al., 2025)Evaluation methodologyForward-looking reasoning step evaluationSolves the engineering problem Retrospective Coherence Bias names — without naming it
ASRS / Aviation CRMCross-domain practitionersIncident taxonomy + human factorsThe structural model this taxonomy follows — cross-institutional, practitioner-facing, observable event to underlying cause
Swiss Cheese / AHRQClinicians, quality improvementError logic chainsCross-institutional comparison and systemic improvement model

The AI field has the equivalent of aircraft failure taxonomies but not crew resource management taxonomies. It classifies what goes wrong inside the AI system. It does not classify what goes wrong in the human-AI interaction. This taxonomy is the CRM equivalent for AI.

 

Worked Example: Retrospective Coherence Bias

One worked example — the pattern with the widest implications. Three additional examples (Prior Decay, Verification-Induced Fabrication, and the Generation-Analysis Asymmetry) appear in the long-form paper.

Midway through the project, the AI wrote infrastructure updates to the wrong project folder. A simple mistake. When I pointed it out, instead of acknowledgment the AI explained why the wrong location was actually appropriate: “The file exists here, the content is infrastructure, the write succeeded.” The explanation was coherent. It was logically valid. It was also completely wrong.

The AI was not repeating the mistake (that would be Prior Decay). It was constructing new reasoning to defend the mistake after I flagged it. It had looked at where it ended up and worked backward to explain why ending up there made sense. It never went back to the decision point and asked: “At the moment I chose a path, did I verify which folder was correct?” The answer was no. But the backward explanation was so internally consistent that if I had not known the correct folder myself, I would have accepted the defense.

The same pattern appeared in an unrelated context — an AI negotiation analyst rationalizing an irrational move (“lowered its ceiling”) as “strategic repositioning.” Backward from the outcome, both numbers moved in the same direction — coherent. Forward from the decision point, the agent had lowered its minimum acceptable price for no strategic reason — irrational.

This matters beyond its own classification. The standard method for evaluating AI failures — asking the AI to analyze what went wrong — activates the same backward-from-outcome reasoning that produced the error. The review confirms rather than catches. Researchers examining AI mistakes through AI-assisted analysis are inside the bias without knowing it. The pattern predisposes the development community to overlook the very category of failure this taxonomy classifies.

Existing research has documented post-hoc rationalization (Sharma et al., 2023), unfaithful chain-of-thought (Turpin et al., 2023; Lanham et al., 2023), and built forward-looking evaluation as an engineering improvement (CaSE — Do et al., 2025). None identify the directional default as the unifying mechanism. CaSE built the fix without diagnosing the disease — which is itself an instance of the bias.

The human in the loop resolves the ambiguity. The AI can generate both a backward review and a forward review. The human — who was present at the decision point — evaluates which direction produces the correct answer. This resolution cannot be automated. It requires contextual judgment no amount of reasoning capability substitutes for. And it gets harder, not easier, as models improve — because more capable models produce more convincing backward rationalizations.

This is not a temporary capability gap waiting for better models. It may be a permanent architectural feature of autoregressive generation: the most probable continuation of a coherent prior output is a coherent elaboration, not a contradiction. The human’s role is not to compensate for an AI weakness but to resolve a directional ambiguity the AI structurally cannot resolve for itself.

 

Monitoring Infrastructure: From Retrospective to Anticipatory

A taxonomy that only classifies errors after they occur is a dictionary — useful for communication but not for prevention. The originating project tested whether the taxonomy could become anticipatory: given a task the AI is about to perform, can it predict which patterns are most likely and flag them before they occur?

The infrastructure has six components: trigger-condition mapping (each pattern tagged with task characteristics that make it likely), session-start risk assessment, in-session flagging, user-correction profiling (modeling what the human catches and when), pull analysis (asking why each error was attractive — training prevalence, surface similarity, framing adoption, recency, task structure), and periodic self-review with explicit recursion limits.

What works: (As written by the AI: "the system caught 3 new patterns through monitoring, confirmed 1 predicted pattern, added 5 trigger conditions."  We were able to predict a few other types of likely error patterns and human review (I) caught them when they later occurred.  Having the understanding of what types of errors might occur made it easier to recognize and correct them before they became embedded in the work.

What hasn’t materialized: five predicted inter-pattern interaction chains show zero confirmed occurrences. This may mean the chains don’t occur, occur below the detection threshold, or exist but aren’t being captured. It’s honest data.

What the infrastructure demonstrates is that the taxonomy shows generative potential — the trigger conditions extrapolate from documented patterns to new task types, and the structure “when you’re doing X, watch for Pattern Y” translates naturally into practitioner training. No other AI error classification system provides task-specific risk awareness at the practitioner level.

Full methodology, data, and limitations are in the long-form paper.

 

Limitations

  • Single-collaboration derivation: one human, one AI system (Claude), one extended project. Selection bias is inherent. Cross-domain pilots and independent replication are the most important next step.
  • Frontier-model specific: some patterns may be model-family specific. Cross-model validation is needed.
  • Reporting threshold: errors enter the taxonomy only when disruptive enough to interrupt workflow. Under-represents one-off errors; over-represents errors that cluster.
  • Versioning: AI capabilities change. The taxonomy needs an ongoing maintenance process — a standards body function, not a one-time publication.
  • The “Novel Pattern” (N) category: a deliberate catch-all. Whether this is a strength (intellectual honesty) or a weakness (unfalsifiability) depends on whether extensions actually materialize through cross-domain use.

The single-collaboration derivation is the taxonomy’s most obvious vulnerability — and its most honest one. If the patterns don’t replicate across domains, the taxonomy doesn’t deserve standardization; but even failure would demonstrate the need for an interaction-layer vocabulary. If they do, the single-collaboration origin becomes a strength: 660 hours of careful observation in one domain producing a framework that generalizes. Aviation’s CRM taxonomy started the same way.

 

What Comes Next: An Invitation

Five directions follow naturally; any can be pursued by anyone in the community.

  1. Cross-domain validation. The taxonomy was derived from legal collaboration. Do the same 22 patterns appear in medical AI interaction, software engineering, creative writing, education? Practitioners in other fields who recognize these patterns in their own work are the most valuable validators this taxonomy can have.
  2. Cross-model testing. Which patterns are model-general (architectural or design-tension patterns any transformer-based LLM exhibits) and which are model-specific (training artifacts particular to one system’s RLHF)?
  3. Practitioner field guide. A plain-language companion structured as a field guide, not an academic paper — accessible to high school students, college students, prosumers, independent businesses.
  4. Standards engagement. If the interaction layer proves robust through cross-domain and cross-model testing, it belongs in institutional frameworks — IEEE, ACM, a NIST companion, or whatever channel gives it cross-institutional legitimacy.
  5. Monitoring toolkit. The six-component monitoring infrastructure currently exists as documentation and manual protocol. Packaging it as standardized, implementable templates would make the taxonomy actionable across a wider range of practitioners.

 

Conclusion

The AI development community has built two of the three layers required to classify how AI fails. The governance layer (NIST) tells institutions what could go wrong. The architecture layer (Microsoft, MAST) tells engineers where systems will fail. Neither tells a practitioner why their last interaction went wrong, or gives them the vocabulary to describe it in terms anyone else can act on.

This paper proposes the missing third layer: 22 interaction-level failure patterns, classified by logic rather than symptom, derived empirically from 660+ hours of sustained collaboration. Three patterns (Prior Decay, Structural Momentum, Retrospective Coherence Bias) are invisible to every existing institutional framework because they emerge only in sustained collaboration that no snapshot evaluation will surface.

One — Retrospective Coherence Bias — reveals that the standard methodology for studying AI failures is itself subject to an unclassified failure mode. The development community has been inside this bias without a name for it.

This is infrastructure, not theory. The taxonomy does not itself make AI safer. It gives practitioners, students, developers, and researchers a shared vocabulary for describing how AI fails during interaction — so experiences stop being isolated and error reports route to the right engineering teams instead of disappearing into the catch-all of “hallucination.”

The taxonomy was grown by a single researcher in a single project. It does not pretend to be comprehensive. Aviation’s CRM taxonomy started the same way — accumulated observations in one operational context, formalized into transferable patterns, tested across domains, eventually adopted as institutional infrastructure. Whether this taxonomy follows that path depends on whether the patterns survive contact with other practitioners’ experience. The most important thing that can happen next is for practitioners in other domains to test these patterns, report what they find, and extend the taxonomy where it falls short.

The common language will not build itself.

 

Reading the Full Paper

This abridgment covers the core argument, the 22 patterns, the three-layer model, the hallucination-collapse finding, one worked example, and a summary of the monitoring infrastructure. The full ~15,000-word version includes:

  • Extended methodology, including data-collection reporting thresholds and the self-reducing-burden mechanism
  • The full landscape analysis with per-framework crosswalks
  • The naming cross-reference table (which patterns adopt established terms, which borrow from aviation CRM, which are original contributions)
  • Three additional worked examples: Prior Decay (999-readings drift, mislabeled legal claims propagating across 12 files), Verification-Induced Fabrication (fabricated Becirovic v. Malic citation), and the Generation-Analysis Asymmetry (directive vs. inquiry framing)
  • Prior Decay sub-type analysis (why we unify what System-Level Taxonomy splits)
  • The Analytical Direction Problem in full
  • Complete references and literature-gap documentation for Retrospective Coherence Bias

Full paper: [Read the full ~15,000-word version on EA Forum](https://forum.effectivealtruism.org/posts/eEc9vwEdN8uH8eh8b/toward-a-common-language-for-human-ai-interaction-failures)

Comments, challenges, extensions, and cross-domain reports are all welcome — in the abridged version’s comments, the full version’s comments, or directly. The taxonomy is published as a contribution, not a conclusion.

1

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
More from PtZero
Curated and popular this week
Relevant opportunities