One of the arguments for the existential risks of AGI is based on the orthogonality of the system's goals and the instrumental goals it would develop regardless of its stated goals. So isn't the problem having 'goals,' ie specific objectives which are being optimized for by taking actions, at all? I'm wondering how this line of argument would apply to a mere 'oracle' AGI, e.g. one which emerged out of scaling up a Foundation Model on multimodal data such that we can query it and it will output a prediction. It's not running unconstrained with the goal of optimized some objective; it's just been trained to output the most likely next input. How could such a system 'go rogue'?




Sorted by Click to highlight new comments since:

Hey! Consider posting to the All AGI safety questions welcome thread.

My own shortest answer [I'm not an expert] would be:

What would you do next with that oracle? 


It's going to have to be something useful enough to prevent another company from creating an AGI and destroying the world, while also not destroying the world yourself by accident.

See also bullet 5 here.


My second separate answer would be "it would be pretty easy to turn that oracle AI into an AGI, so you better watch out from someone doing that, including someone stealing your code"

Some of the risks that come to mind: risks from inner alignment problems and from potential moral patienthood of AI systems.

Curated and popular this week
Relevant opportunities