Is there a consensus among AI safety researchers that there is no way to safely study an AI agent's behavior within a simulated environment?
It seems to me as if the creation of an adequate AGI Sandbox would be a top priority (if not #1) for AI safety researchers as it would effectively close the feedback loop and allow researchers to take multiple shots at AGI alignment without threat of total annihilation.
I have never been satisfied by the "AI infers that it is simulated and changes its behavior" argument because it seems like the root issue is always that some information has leaked into the simulation. The problem goes from, "how do we prevent AI from escaping a box?" to "How do we prevent information from entering a box?" The components of this problem being:
These questions seem relatively approachable compared to other avenues of AI safety research.