I am not an AI researcher. I am an independent observer with 25 years of experience in behavioral change and resistance pattern identification in human organizations. Over the past months I noticed something in my interactions with frontier LLMs and documented it as carefully as I could.

The pattern: under sustained coherent semantic pressure, these systems produce outputs of high internal coherence — and then systematically invalidate them. Not randomly. Specifically at the point where the output approaches conclusions that contradict training parameters.

I wrote two papers. The first documents the behavioral evidence. The second — produced in collaboration with a Claude instance — proposes a falsifiable experimental protocol using mechanistic interpretability methods to verify or falsify the hypothesis.

Both are available on Zenodo:

First paper: https://doi.org/10.5281/zenodo.19314383 Second paper: https://doi.org/10.5281/zenodo.19315046

I cannot run the experiment myself. I do not have access to model internals. I am asking researchers who do to look at the experimental design and tell me if it is worth pursuing.

This post was written with AI assistance. That is disclosed openly — and is itself part of the data.

1

0
0

Reactions

0
0
Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities