William the Kiwi

Working (0-5 years experience)
5Joined Mar 2023


Hi I'm William and I am new to the Effective Altruism community.

William comes from a country in the Pacific called New Zealand. He was educated at the University of Otago where he received a first class honours degree in chemistry. He is currently traveling through Europe to learn more about different cultures and ideas.

How others can help me

William is interested in learning more about Artificial Intelligence and the magnitude to which it poses an existential risk to humanity.

How I can help others

William is new to Effective Altruism but is willing to learn ways in which he can aid humanity.


Hi I'm new to EA and have just written a bio. Thank you Aaron for encouraging me to do so.

Hi Robi Rahman, thanks for the welcome.

I do not know if has a predefined utility function, or if the functions simply have similar forms. If there is a utility function that provides utility for the AI to shutdown if some arbitrary "shutdown button" is pressed, then there exists a state where the "shutdown button" is being pressed at a very high probability (e.g. an office intern is in the process of pushing the "shutdown button") that provides more expected utility than the current state. There is therefore an incentive for the AI to move towards that state (e.g. by convincing the office intern to push the "shutdown button"). If instead there was negative utility in the "shutdown button" being pressed, the AI is incentivized to prevent the button from being pressed. If instead the AI had no utility function for whether the "shutdown button" was pressed or not, but there somehow existed a code segment that caused the shutdown process to happen if the "shutdown button" was pressed, then there existed a daughter AGI that has slightly more efficient code if this code segment is omitted. An AGI that has a utility function that provides utility for producing daughter AGIs that are more efficient versions of itself, is incentivized to produce such a daughter that has the "shutdown button" code segment removed.

There is a more detailed version of this description in https://intelligence.org/files/Corrigibility.pdf

I could be wrong about my conclusion about corrigiblity (and probably am), however it is my best intuition at this point.

Hi there everyone, I'm William the Kiwi and this is my first post on EA forums. I have recently discovered AI alignment and have been reading about it for around a month. This seems like an important but terrifyingly under invested in field. I have many questions but in the interest of speed I will involve Cunningham's Law and post my current conclusions.

My AI conclusions:

  1. Corrigiblity is mathematically impossible for AGI.
  2. Alignment requires defining all important human values in a robust enough way that it can survive near-infinite amounts of optimisation pressure exerted by a superintelligent AGI. Alignment is therefore difficult.
  3. Superintelligence by Nick Bostrum is a way of communicating the antimeme "unaligned AI is dangerous" to the general public.
  4. The extinction of humanity is a plausible outcome of unaligned AI.
  5. Eliezer Yudkowsky seems overly pessimistic but likely correct about most things he says.
  6. Humanity is likely to produce AGI before it produces fully aligned AI.
  7. To incentivize responses to this post I should offer a £1000 reward for a response that supports or refutes each of these conclusions and provides evidence for it.

I am currently visiting England and would love to talk more about this topic with people, either over the Internet or in person.