Hi I'm William and I am new to the Effective Altruism community.
William comes from a country in the Pacific called New Zealand. He was educated at the University of Otago where he received a first class honours degree in chemistry. He is currently traveling through Europe to learn more about different cultures and ideas.
William is interested in learning more about Artificial Intelligence and the magnitude to which it poses an existential risk to humanity.
William is new to Effective Altruism but is willing to learn ways in which he can aid humanity.
Hi Robi Rahman, thanks for the welcome.
I do not know if has a predefined utility function, or if the functions simply have similar forms. If there is a utility function that provides utility for the AI to shutdown if some arbitrary "shutdown button" is pressed, then there exists a state where the "shutdown button" is being pressed at a very high probability (e.g. an office intern is in the process of pushing the "shutdown button") that provides more expected utility than the current state. There is therefore an incentive for the AI to move towards that state (e.g. by convincing the office intern to push the "shutdown button"). If instead there was negative utility in the "shutdown button" being pressed, the AI is incentivized to prevent the button from being pressed. If instead the AI had no utility function for whether the "shutdown button" was pressed or not, but there somehow existed a code segment that caused the shutdown process to happen if the "shutdown button" was pressed, then there existed a daughter AGI that has slightly more efficient code if this code segment is omitted. An AGI that has a utility function that provides utility for producing daughter AGIs that are more efficient versions of itself, is incentivized to produce such a daughter that has the "shutdown button" code segment removed.
There is a more detailed version of this description in https://intelligence.org/files/Corrigibility.pdf
I could be wrong about my conclusion about corrigiblity (and probably am), however it is my best intuition at this point.
Hi there everyone, I'm William the Kiwi and this is my first post on EA forums. I have recently discovered AI alignment and have been reading about it for around a month. This seems like an important but terrifyingly under invested in field. I have many questions but in the interest of speed I will involve Cunningham's Law and post my current conclusions.
My AI conclusions:
I am currently visiting England and would love to talk more about this topic with people, either over the Internet or in person.
Hi I'm new to EA and have just written a bio. Thank you Aaron for encouraging me to do so.