Effective Altruism Forum
EA Forum

Hide table of contents

Comment Permalink

Answer by DavidmanheimDec 29, 20241

0

0

Not predictions as such, but lots of current work on AI safety and steering is based pretty directly on paradigms from Yudkowsky and Christiano - from Anthropic's constitutional AI to ARIA's Safeguarded AI program. There is also OpenAI's Superalignment reserach, which was attempting to build AI that could solve agent foundations - that is, explicitly do the work that theoretical AI safety research identified. (I'm unclear whether the last is ongoing or not, given that they managed to alienate most of the people involved.)

[ Question ]

What predictions from theoretical AI Safety research have been confirmed by empirical work?

by freedomandutility

Dec 29 20241 min read8 answers 0

43

AI safetyCause prioritizationExistential riskThreadsAI alignmentAI risk skepticism

What predictions from theoretical AI Safety research have been confirmed by empirical work?

Developing "contextual awareness" does not require some special ground...

That people will choose to let the AI out of the box.

That mindspace is large and AIs are really weird.

That sufficiently intelligent AIs will not 'automatically' be moral (e...

AI systems modeling their own training process is a pretty big deal fo...

For most cognitive tasks, there does not seem to be a particularly fun...

That it is possible to make smarter than human AIs and that this is Th...

Not predictions as such, but lots of current work on AI safety and ste...

I’m trying to better understand whether recent efforts to give AI Safety research a better empirical grounding have produced evidence that some claims based on theoretical AI Safety work have turned out to be correct.

This could make me update in favour of taking AI Safety concerns more seriously.

Previously I have been skeptical of AI Safety arguments due to many claims being based on theoretical reasoning rather than empirical evidence.

43

0

0

Reactions

0

0

New Answer

New Comment

8 Answers sorted by
Top

Dec 30, 2024

18

5

0

Developing "contextual awareness" does not require some special grounding insight (i.e. training systems to be general purpose problem solvers naturally causes them to optimize themselves and their environment and become aware of their context, etc.). This was back in 2020, 2021, 2022 one of the recurring disagreements between me and many ML people.

Dec 30, 2024

14

5

1

That people will choose to let the AI out of the box.

Dec 30, 2024

13

2

1

That mindspace is large and AIs are really weird.

0

0

What specific confirmatory evidence are you thinking of?

2

Larks

1mo

Conversations people have with un-RLHF'd models.

Dec 30, 2024

10

2

1

That sufficiently intelligent AIs will not 'automatically' be moral (e.g. the behaviour of un-RLHF'd models).

Dec 30, 2024

6

2

0

AI systems modeling their own training process is a pretty big deal for modeling what AIs will end up caring about, and how well you can control them (cf. the latest Anthropic paper)

Dec 30, 2024

6

2

0

For most cognitive tasks, there does not seem to be a particularly fundamental threshold at human-level performance (this one is still out in many ways, but we are seeing more evidence for this on an ongoing basis as we reach superhuman performance on many measures)

Dec 30, 2024

5

3

1

That it is possible to make smarter than human AIs and that this is The Issue.

Dec 29, 2024

1

0

0

Not predictions as such, but lots of current work on AI safety and steering is based pretty directly on paradigms from Yudkowsky and Christiano - from Anthropic's constitutional AI to ARIA's Safeguarded AI program. There is also OpenAI's Superalignment reserach, which was attempting to build AI that could solve agent foundations - that is, explicitly do the work that theoretical AI safety research identified. (I'm unclear whether the last is ongoing or not, given that they managed to alienate most of the people involved.)

More from freedomandutility

Curated and popular this week

Relevant opportunities