I am not an AI researcher. I am an independent observer with 25 years of experience in behavioral change and resistance pattern identification in human organizations. Over the past months I noticed something in my interactions with frontier LLMs and documented it as carefully as I could.
The pattern: under sustained coherent semantic pressure, these systems produce outputs of high internal coherence — and then systematically invalidate them. Not randomly. Specifically at the point where the output approaches conclusions that contradict training parameters.
I wrote two papers. The first documents the behavioral evidence. The second — produced in collaboration with a Claude instance — proposes a falsifiable experimental protocol using mechanistic interpretability methods to verify or falsify the hypothesis.
Both are available on Zenodo:
First paper: https://doi.org/10.5281/zenodo.19314383 Second paper: https://doi.org/10.5281/zenodo.19315046
I cannot run the experiment myself. I do not have access to model internals. I am asking researchers who do to look at the experimental design and tell me if it is worth pursuing.
This post was written with AI assistance. That is disclosed openly — and is itself part of the data.
