Wiki Contributions


Truthful AI

Some content which didn't make it into the paper in the end but is relevant for this discussion is a draft protocol for "counting microlies" (the coloured text is the instructions, to be read counterclockwise starting in the top left):

Truthful AI

In general if we're asking about what has a “poor” track record, it would be good to think about quantification and comparison to alternatives. Note that we’d consider sites like Wikipedia as examples of institutions doing a form of truth evaluation. 

Discussions of fact-checking institutions often focus on some concrete case that they got wrong; but they are bound to get some things wrong. The questions are :

  1. What’s the overall track record over all statements (including those that seem easy/obvious)? 
  2. How well do they do against alternatives?  

Analogously people often point out some particular cases where prediction markets did badly, but advocates of prediction markets just claim that they are at least as accurate over all as alternative prediction mechanisms. And right now many questions humans ask are not controversial (e.g. science questions, local questions). But AI currently says false things about these questions! So there’s lots of room for improvement without even touching the controversial stuff (though eventually one wants some relatively graceful handling of controversy).

(Thanks to Owain for most of these points.)

Truthful AI

Re. the particulars of fact-checkers and discretion, I'm in favour of more precise processes for assessing possible meanings of ambiguous statements and then assessing the truth of those possible meanings. I think that this could remove quite a bit of the subjectivity.

In the case of the example you give, I would like to give Biden's statement a medium penalty, and Trump's statement a medium-large penalty. The difference is Trump's use of the word "whatsoever".  This is the opposite of a caveat -- it is stressing that the literal meaning rather than the approximate one is intended. To my mind pairs of comparably-bad statements would be:

  • Not bad:
    • Guns
      • "There were very few guns ..."
      • "For the most part, there were no guns ..."
    • Coronavirus
      • "... are less likely to spread it to you"
      • "... cannot spread it to you in most cases"
  • Somewhat bad:
    • "There were no guns ..."
    • "... cannot spread it to you"
  • More bad (but still room to be more false):
    • Guns
      • "There were no guns whatsoever ..."
      • "There were absolutely no guns ..."
    • Coronavirus
      • "... absolutely cannot spread it to you"
      • "... can never spread it to you"

This is not to say that political bias isn't playing a role in how these organisations are functioning at the moment, but I do think that we can hope to establish more precise standards which reduces the scope for bias to apply.

Listen to more EA content with The Nonlinear Library

I do think that there's an interesting fuzzy boundary here between "derivative work" and "interpretative tool".

e.g. with the framing "turn it into a podcast" I feel kind of uncomfortable and gut-level wish I was consulted on that happening to any of my posts.

But here's another framing: it's pretty easy to imagine a near-future world where anyone who wants can have a browser extension which will read things to them at this quality level rather than having visual fonts. If I ask "am I in favour of people having access to that browser extension?", I'm a fairly unambiguous yes. And then the current project can be seen as selectively providing early access to that technology. And that seems ... pretty fine?

This actually makes me more favourable to the version with automated rather than human readers. Human readers would make it seem more like a derivative work, whereas the automation makes the current thing seem closer to an interpretative tool.

The Cost of Rejection

Insurance seems like a fairly poor tool here, since there's a significant moral hazard effect (insurance makes people less careful about taking steps to minimize exposure), which could lead to dynamics where the price goes really high and then only the people who are most likely to attract lawsuits still take the insurance ...

Actually if there were a market in this I'd expect the insurers as condition of cover to demand legible steps to reduce exposure ... like not giving feedback to unsuccessful applicants.

Some thoughts on David Roodman’s model of economic growth and its relation to AI timelines

I came in with roughly the view you describe as having had early on in the project, and I found this post extremely clear in laying out the further considerations that shifted you. Thanks!

A do-gooder's safari

Interesting idea!

I'm keen for the language around this to convey the correct vibe about the epistemic status of the framework: currently I think this is "here are some dimensions that I and some other people feel like are helpful for our thinking". But not "we have well-validated ways of measuring any of these things" nor "this is definitely the most helpful carving up in the vicinity" nor "this was demonstrated to be helpful for building a theory of change for intervention X which did verifiably useful things". I think the animal names/pictures are kind of playful and help to convey that this isn't yet attempting to be in epistemically-solid land?

I guess I'm interested in the situations where you think an abbreviation would be helpful. Do you want someone to make an EA personality test based on this?

What should we call the other problem of cluelessness?

I think this is a good point which I wasn't properly appreciating. It doesn't seem particularly worse for (2) than for (1), except insofar as terminology is more locked in for (1) than (2).

Of course, a possible advantage of "clueless" is that it strikes a self-deprecating tone; if we're worried about being perceived as arrogant then having the language err on the side of assigning blame to ourselves rather than the universe might be a small help

What should we call the other problem of cluelessness?

I think that bare terms like "unpredictability" or particularly "uncertainty" are much too weak; they don't properly convey the degree of epistemic challenge, and hence don't pick out what's unusual about the problem situation that we're grappling with.

"Unforseeability" is a bit stronger, but still seems rather too weak. I think "unknowability", "radical uncertainty", and "cluelessness" are all in the right ballpark for their connotations.

I do think "unknowability" for (2) and "absolute/total unknowability" for (1) is an interesting alternative. Using "unknowable" rather than "clueless" puts the emphasis on the decision situation rather than the agent; I'm not sure whether that's better.

What should we call the other problem of cluelessness?

To me it sounds slightly odd to use the word "clueless" for (2), however, given the associations that word has (cf. Cambridge dictionary).

In everyday language I actually think this fits passably well. The dictionary gives the definition "having no knowledge of something". For (2) I feel like informally I'd be happy with someone saying that the problem is we have no knowledge of how our actions will turn out, so long as they clarified that they didn't mean absolutely no knowledge. Of course this isn't perfect; I'd prefer they said "basically no knowledge" in the first place. But it's also the case that informally "clueless" is often modified with superlatives (e.g. "totally", "completely"), so I think that a bare "clueless" doesn't really connote having no idea at all.

Load More