Does AI safety need a stronger concept of answerability?

vladisav jovanovic

I’ve been thinking about a distinction that seems increasingly important for AI safety:

A system can be highly coherent without being meaningfully answerable.

By coherence, I mean something like fluency, plausibility, internal consistency, and the feeling that an answer is complete - basically that the answer sounds smart and well polished.

By answerability, I mean something different: the system remains grounded to falsifiers, revision conditions, traceable support, and some visible relation between error and consequence.

What concerns me is that some AI systems are getting very good at producing coherence while remaining weakly answerable.

That seems important because coherence is not neutral. It can create trust, reduce checking, and give users the feeling that something has been understood or responsibly handled — even when the system is still weakly grounded.

This seems especially relevant in cases like:
- high-stakes advice
- companion or emotionally supportive AI
- synthetic media and public information environments
- institutional AI use where decisions become harder to question

The issue is that coherence can shape the user’s stance before truth or falsity is even settled, which can seem seductive and manipulative.

The system doesn't bear consequences, while users do.

Which indicates many cases where people lost money, jobs, relationships..

One simple check I’ve been using is:

- What would show this is wrong?
- Who pays if it is wrong?
- What would make it revise?
- What independent trace anchors the claim outside the system’s own fluency?

If those questions cannot be answered clearly, then perhaps the system is functioning more as a coherence engine than an answerable source.

I assume parts of it overlap with existing work on corrigibility, evaluations, human factors, interpretability, and alignment.

What I’m trying to figure out is whether “answerability” names something real that current AI safety language still doesn’t fully capture.

So I’d be curious what people here think:

- Does this distinction track anything important?
- Where does current work already handle it well?
- Where would high-coherence / low-answerability systems be most dangerous?
- What would count as evidence that this framing is unnecessary or wrong?

Effective Altruism Forum
EA Forum

Does AI safety need a stronger concept of answerability?

1

1

Reactions