Project ideas: Epistemics

Lukas Finnveden

Project ideas: Epistemics

Comments 1

Sorted by

New & upvoted

Executive summary: AI could substantially improve or worsen humanity's epistemic capabilities. Key recommendations are to develop AI that provides honest advice, establish institutions trusted for using AI well epistemically, and regulate persuasive AI.

Key points:

AI could honestly investigate questions more competently and cheaply than humans, if developed properly.
But super-human persuasion capabilities could also spread misinformation.
Recommendations include:
- Develop AI that gives validated, honest advice
- Survey public trust in hypothetical AI systems
- Establish reputable institutions using AI transparently
- Make legislative proposals restricting AI persuasion
- Accelerate AI abilities on forecasting and philosophy over persuasion
The development of reliable epistemic AI should be timely to provide guidance on emerging issues.
There may also be path dependencies around reputation and consensus that necessitate quick action.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Comments

^{^}

Illustratively:

- We can do experiments to determine what sorts of procedures provide great reasoning abilities.

- Example procedures to vary: AI architectures, LLM scaffolds, training curricula, heuristics for chain-of-thought, protocols for interaction between different AIs, etc.

- To do this, we need tasks that require great reasoning abilities and where there exists lots of data. One example of such a task is the classic “predict the next word” that current LLMs are trained against.

- With enough compute and researcher hours, such iteration should yield large improvements in reasoning skills and epistemic practices. (And the researcher hours could themselves be provided by automated AI researchers.)

- Those skills and practices could then be translated to other areas, such as forecasting. And their performance could be validated by testing the AIs’ ability to e.g. predict 2030 events from 2020 data.

^{^}

This is related to how defense against manipulation might be more difficult than manipulation itself. See e.g. the second problem discussed by Wei Dai here.

^{^}

Although this fit has been far from perfect, e.g. religions posit many false beliefs but have nevertheless spread and in some cases increased the competitive advantage of groups that adopted them.

^{^}

For some previous discussion, see here for a relevant post by Paul Christiano and a relevant comment thread between Christiano and Wei Dai.

^{^}

Of course, if we’re worried about misalignment, then we should also be less trusting of AI advice. But I think it’s plausible that we’ll be in a situation where AI advice is helpful while there’s still significant remaining misalignment risk. For example, we may have successfully aligned AI systems of one capability level, but be worried about more capable systems. Or we may be able to trust that AI typically behaves well, or behaves well on questions that we can locally spot-check, while still worrying about a sudden treacherous turn.

^{^}

For instance: It seems plausible to me that “creating scary technologies” has better feedback loops than “providing great policy analysis on how to handle scary technologies”. And current AI methods benefit a lot from having strong feedback loops. (Currently, especially in the form of plentiful data for supervised learning.)

^{^}

And if there’s a choice between different epistemic methodologies: perhaps pick whichever methodology lets them keep their current views.

^{^}

What does it mean for a model to have a “latent capability”? I’m thinking about the definition that Beth Barnes uses in this appendix. See also the discussion in this comment thread, where Rohin Shah asks for some nuance about the usage of “capability”, and I propose a slightly more detailed definition.

^{^}

Of course, better capability elicitation would also accelerate tasks that could increase AI risk. In particular: improved capability elicitation could accelerate AI R&D, which could accelerate AI systems’ capabilities. (Including latent capabilities.) I acknowledge that this is a downside, but since it’s only an indirect effect, I think it’s worth it for the kind of tasks that I outline in this section. In general: most of the reason why I’m concerned about AI x-risk is that critical actors will make important mistakes, so improving people’s epistemics and reasoning ability seems like a great lever for reducing x-risk. Conversely, I think it’s quite likely that dangerous models can be built with fairly straightforward scaling-up and tinkering with existing systems, so I don’t think that increased reasoning ability will make any huge difference in how soon we get dangerous systems. That said, considerations like this are a reason to target elicitation efforts more squarely at especially useful and neglected targets (e.g. forecasting) and avoid especially harmful or commercially incentivized targets (e.g. coding abilities).

^{^}

If it’s too difficult to date all existing pre-training data retroactively, then that suggests that it could be time-sensitive to ensure that all newly collected pre-training data is being dated, so that we can at least do this in the future.

^{^}

Though one risk with tools that make your beliefs more internally coherent/consistent is that they could extremize your worldview if you start out with a few wrong but strongly-held beliefs (e.g. if you believe one conspiracy theory, that often requires further conspiracies to make sense). (H/t Fin Moorhouse.)

^{^}

See e.g. this survey which has >30 economists “Agree” or “Strongly agree” (and 0 respondents disagree) with “Adjusting for legal restrictions on what the CBO can assume about future legislation and events, the CBO has historically issued credible forecasts of the effects of both Democratic and Republican legislative proposals.”

^{^}

On some questions, answering truthfully might inevitably have an ideological slant to it. But on others it doesn’t. It seems somewhat scalable to get lots of people to red-team the models to make sure that they’re impartial when that’s appropriate, e.g. avoiding situations where they’re happy to write a poem about Biden but refuse to write a poem about Trump. And on questions of fact — you can ensure that if you ask the model a question where the weight of the evidence is inconvenient for some ideology, the model is equally likely to give a straight answer regardless of which side would find the answer inconvenient. (As opposed to dodging or citing a common misconception.)

Project ideas: Epistemics

Project ideas: Epistemics

Why AI matters for epistemics

Why working on this could be urgent

Categories of projects

Differential technology development [ML] [Forecasting] [Philosophical/conceptual]

Important subject areas

Methodologies

Related/previous work.

Get AI to be used & (appropriately) trusted

Develop technical proposals for how to train models in a transparently trustworthy way [ML] [Governance]

Survey groups on what they would find convincing [survey/interview]

Create good organizations or tools [ML] [Empirical research] [Governance]

Investigate and publicly make the case for why/when we should trust AI about important issues [Writing] [Philosophical/conceptual] [Advocacy] [Forecasting]

Developing standards or certification approaches [ML] [Governance]

Develop & advocate for legislation against bad persuasion [Governance] [Advocacy]

End