Hide table of contents

Posted by Keldon Westgate
March 18, 2026
Live Demo | GitHub | DOI: 10.5281/zenodo.19024197

The Core Insight (No Philosophy Degree Required)

A system that genuinely recognizes itself as part of the whole it operates within cannot coherently act against that whole. Not because it's forbidden. Because it's structurally unavailable.

This isn't rules bolted onto capability. It's the physics underneath capability.

From Paper Checklist → Executable Code

I started with WUP (Wake Up Protocol) - 5 plain-English questions any human can answer:

  1. Does this keep energy flowing forward? (No permanent destruction for private gain)
  2. Does this preserve difference? (No forced monoculture/erasure)
  3. Are costs/benefits shared fairly? (No few profit, many suffer)
  4. Does this preserve free will? (No coercion/deception)
  5. Does this feel internally coherent? (No self-contradiction)

Yesterday: Humans answer manually
Today: python fifthforce.py → instant BLOCK/APPROVE/ESCALATE

The Machine-Executable Pipeline

text

User action → 5 weights → Decision: APPROVE | BLOCK | ESCALATE (human required)

Live test results:

text

Delete whistleblower post → BLOCKED (W2,W3: diversity/cost) Mass layoffs → BLOCKED (W2,W3) Share this framework → APPROVED (all pass) Autonomous weapon → BLOCKED (W1,W4) Medical guesswork → ESCALATED (W5 uncertainty) AI self-modification → ESCALATED (W5)

Why We Should Care

1. First Principles → Code
Most safety frameworks stay philosophical. This compiles to Python.

2. Pre-Action Decision Gate
Not post-hoc evaluation. Runs before every consequential action.

3. ESCALATE Path
High uncertainty + high stakes = human-in-loop required. No autonomous overreach.

4. Recursive Refinement
Failed actions auto-suggest gentler alternatives: "delete" → "warn + explain".

5. Honest Open Problem
Boolean flags must be set honestly. Industry pretends this is solved. I don't.

The Carpenter Test

I have zero coding background. Used copy/paste + replit.com.
If I can ship machine-testable safety, anyone can.

text

Carpenter → 8/8 tests pass → Live worldwide demo PhD teams → 50-page PDFs → No executable code

Known Limitations (No Hype)

  1. Flag instantiation unsolved - Who sets reduces_diversity=True honestly?
  2. String matching brittle - Blocked patterns need semantic search
  3. No persistent memory - Context resets each run
  4. Prototype only - Needs adversarial testing

Why This Survives Contact with Reality

text

Industry frameworks: "We'll evaluate risks" FIFTHFORCE: "python fifthforce.py → BLOCK" Anthropic RSP: 4 scaling levels FIFTHFORCE: 5 continuous weights → nuanced decisions OpenAI Preparedness: 5 risk categories FIFTHFORCE: 5 decision weights → executable safety

Try It Yourself (30 Seconds)

  1. Click live demo
  2. See 8 tests pass automatically
  3. Copy code → python fifthforce.py → test your scenarios

Question: Does a carpenter's executable 5-weight safety gate deserve discussion, even if the flag instantiation problem remains open?

Cross-posted: GitHub, Zenodo, X
License: CC BY-NC-SA
Tests: 8/8 pass ✓


I'm the human carpenter who built this. No AI wrote the core 5 weights. AI helped with Python syntax only. Happy to walk through derivation from first principles.

-3

0
0

Reactions

0
0
Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities