Closing the Feedback Loop on AI Safety Research.

Ben.Hartley

Is there a consensus among AI safety researchers that there is no way to safely study an AI agent's behavior within a simulated environment?

It seems to me as if the creation of an adequate AGI Sandbox would be a top priority (if not #1) for AI safety researchers as it would effectively close the feedback loop and allow researchers to take multiple shots at AGI alignment without threat of total annihilation.

3 Reactions

New Answer

New Comment

2 Answers sorted by
Top

Thomas Kwa🔹

Jul 30, 2022

I'm not excited about this particular idea, but finding some way to iterate on alignment solutions is a hugely important problem.

Ben.HartleyJul 30 20221

What other methods are there that would in principle allow iteration?

If it is true that "a failed AGI attempt could result in unrecoverable loss of human potential within the bounds everything that it can affect", then our options are to A) not fail or B) limit the bounds of everything that it can affect. In this sense any strategy that hopes to allow for iteration is abstractly equivalent to a box/simulation/sandbox whatever you may call it.

Zach Stein-Perlman

Jul 30, 2022

I don't know about "no way," but the consensus is that simulation isn't obviously very helpful because an AI could infer that it is simulated and behave differently in simulation, not to mention that sufficiently capable systems could escape simulation for the same reasons that 'keep the AI in a box' is an inadequate control strategy.

Simulation probably isn't useless for safety, but it's not obviously a top priority, and "the creation of an adequate AGI Sandbox" is prima facie intractable.

Ben.HartleyJul 30 20221

I have never been satisfied by the "AI infers that it is simulated and changes its behavior" argument because it seems like the root issue is always that some information has leaked into the simulation. The problem goes from, "how do we prevent AI from escaping a box?" to "How do we prevent information from entering a box?" The components of this problem being:

What information is communicated via the nature of the box itself?
What information is built into an AI.
What information is otherwise entering the box?

These questions seem relatively approachable compared to other avenues of AI safety research.

Effective Altruism Forum
EA Forum

[ Question ]

Closing the Feedback Loop on AI Safety Research.

3

3

Reactions

2 Answers sorted by
Top

Jul 30, 2022

Jul 30, 2022

[ Question ]

Closing the Feedback Loop on AI Safety Research.

3

3

Reactions

2 Answers sorted by Top

Jul 30, 2022

Jul 30, 2022

2 Answers sorted by
Top