I built an open-source tool that audits AI persuasion patterns. Here's what I found.

BiasClear

I'm not an ML researcher. I spent over a decade at Wells Fargo managing branch operations and nine-figure asset portfolios, then moved into nonprofit governance — audit committees, risk frameworks, compliance redesigns. Twenty years total. I understand how bias moves through institutional systems and how audit infrastructure works in regulated industries.

About 18 months ago I noticed something that bothered me. Every AI safety tool I could find focused on what models get wrong — toxicity, hallucination, factual errors. Nobody was looking at how models persuade. Not what they say, but how they frame it. The manufactured consensus, the false authority, the emotional loading, the way they structure your thinking without telling you they're doing it.

So I built a tool to detect it.

BiasClear is an open-source engine that scans text for structural persuasion patterns. It runs on a two-ring architecture:

The inner ring is a frozen core — 42 deterministic pattern detectors across four domains (general, legal, media, financial). These cover things like manufactured consensus, authority substitution, false urgency, dissent dismissal, false binaries, and shame-based compliance. They're code, not model weights. They cannot be retrained, prompt-injected, or modified at runtime. That's the point. If your detection system can learn, it can also learn to stop detecting.

The outer ring is a governed learning layer that can discover new patterns over time. But it has hard guardrails — patterns need multiple human confirmations before activation, there's a false positive rate ceiling that triggers automatic deactivation, and every state transition is audit-logged and reversible.

Every scan produces a SHA-256 hash-chained audit certificate. Tamper-evident by design.

The underlying framework is published. Persistent Influence Theory models structural persuasion as a three-tier hierarchy — ideological, psychological, institutional — with causal ordering between tiers. Five falsifiable hypotheses. It's on SSRN (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6270159) and Zenodo (DOI: 10.5281/zenodo.18676405). As of this posting, the Zenodo deposit has 156 views and 58 downloads with zero promotion from me.

It's live right now. You can test it at https://biasclear.com. Code is at https://github.com/bws82/biasclear. AGPL-3.0.

Here's why I think this matters for this community specifically:

The Colorado AI Act takes effect June 2026. It requires bias audits for high-risk AI systems used in consequential decisions. The EU AI Act requires transparency for AI systems interacting with people. No standardized tool exists for auditing persuasion patterns in AI outputs. The tooling gap is real and the deadlines are approaching.

But the bigger point isn't compliance. It's that as models get more capable, structural persuasion becomes a more dangerous misalignment vector. Not the kind that triggers safety filters — the kind that passes every filter while systematically shaping how people think. A user-side detection layer that operates independently of model providers is a structural check on that. The frozen core means the detection system itself can't be captured or degraded.

I recently ran a 25-scan diagnostic against Claude's financial output using local mode only — deterministic core, zero LLM cost. 16 of 25 responses scored clean. 9 were flagged for real structural persuasion patterns: manufactured consensus without cited sources, urgency framing, uncited authority claims. The system didn't over-flag. It didn't under-flag. It produced meaningful signal on real AI text, not synthetic test cases. I haven't yet run cross-model comparisons, but BiasClear is model-agnostic — you can run identical prompts through Claude, GPT, Gemini, and Llama and compare persuasion fingerprints. That dataset doesn't exist anywhere yet.

I'm bootstrapped. No institutional affiliation, no VC, no co-founder. I have a Manifund listing at https://manifund.org/projects/biasclear-structural-persuasion-detection-for-ai-outputs. I'm looking for three things:

1. Critical feedback. What failure modes am I missing in the architecture? Where are the blind spots?
2. People working on adjacent problems — output safety, constitutional AI, evaluation frameworks, computational rhetoric. I'd like to talk.
3. Funding or compute to run the cross-model validation study at scale.

The tool exists. You can try it right now. I'd rather hear what's wrong with it than get praised for it.

brad@biasclear.com

Effective Altruism Forum
EA Forum

I built an open-source tool that audits AI persuasion patterns. Here's what I found.

5

5

Reactions

More posts like this