Thanks Yanni, just fixed the link :)
Excellent post, well done putting this together.
In particular, scenario #1 (military / authoritarian accident) has been my main concern in AI x-risk or s-risk. When I first encountered the field of AI safety, I was quite confused by how much attention was focused on big tech relative to global actors with unambiguous intentions to create disturbing types of AI systems that could easily evolve into something catastrophically bad.
I found scenario 2 quite interesting also, although I find the particular sequence of events less plausible in a world where many organisations have access to powerful AI. I don't think this reduces the risks involved in giving advanced AI the ability to develop and execute business strategies, I just think it's much harder to predict how things might go wrong given the complexity involved in that type of world.
I'll also just add that I found scenario #3 to very interesting, and although I personally consider it to be somewhat far fetched, I commend your creativity (and courage). I'd be very surprised if an EA could pull that sort of thing off, but perhaps I'm missing information about how embedded EA culture is within Silicon Valley.
Overall I also like the analysis of why x-risks from "paperclipper" type systems are probably unlikely. My personal take on this is that LLM's might be particularly useful in this regard. I think the idea of creating AGI from RL is what underpinned a lot of the pre-ChatGPT x-risk discussions. Now the conversation has somewhat shifted toward a lack of interpretability in LLM's and why this is a bad thing, but I still believe a shift toward LLM-based AGI might be a good thing.
My rationale is that LLM's seem to inherently understands human values in a way that I think it would be quite difficult match with a pure RL agent. This understanding of human values is obviously imperfect, and will require further improvements, but at least it provides an obvious way to avoid avoiding a paperclipper scenarios.
For example, you can literally tell GPT to act ethically and positive, and you can be fairly certain that it will do things that pretty much always align with that request. Of course, if you try to make it do bad stuff, it will, but that certainly doesn't seem to be the default.
This seems to be in contrast to the more Yudkowskian approach, which assumes that advanced AI will be catastrophically misaligned by default. LLM's seem to provide a way to avoid that; extrapolating forward from OpenAI's efforts so far, my impression is that if we asked Auto-ChatGPT-9 to not kill everyone, it would actually do a pretty good job. To be fair, I'm not sure you could say the same thing about a future version of AlphaGo that has been trained to manage a large corporation, which was until recently, what many imagined advanced AI would look like.
I hope you found some of that brain-dump insightful. Would be keen to hear what your thoughts are on some of those points.
Thanks very much for the recommendation, I'll do that now
Thanks for the kind words.
I did have a bit of a think about what the implications are for finding feasible AI governance solutions, and here's my personal take:
If it is true that 'inhibitive' governance measures (perhaps like those that are in effect at Google) cause ML engineers to move to more dangerous research zones, I believe it might be prudent to explore models of AI governance that 'accelerate' progress towards alignment, rather than slow down the progression towards misalignment.
My general argument would be as follows:
If we assume that it will be unfeasible to buy-out or convince most of the ML engineers on the planet to intrinsically value alignment, then it means that global actors with poor intentions (e.g. imperialist autocracies) will benefit from a system where well-intentioned actors have created a comparatively frustrating & unproductive environment for ML engineers. I.e. not only will they have a more efficient R&D pipeline due to lower restrictions, they may also have better capacity to hire & retain talent over the long-term.
One possible implication from this assertion is that the best course of action is to initiate an AI-alignment Manhattan project that focuses on working towards a state of 'stabilisation' in the geopolitical/technology realm. The intention of this is to change the structure of the AI ecosystem so that it favours 'aligned' AI by promoting progress in that area, rather than accidentally proliferating 'misaligned' AI by stifling progress in 'pro-alignment' zones.
I find this conclusion fairly disturbing and I hope there's some research out there that can disprove it.
Reductionist utilitarian models are like play-dough. They're fun and easy to work with, but useless for doing anything complicated and/or useful.
Perhaps in 100-200 years our understanding of neurobiology or psychometrics will be good enough for utilitarian modelling to become relevant to real life, but until then I don't see any point getting on the train.
The fact that intelligent, well-meaning individuals are wasting their time thinking about the St Petersburg paradox is ironically un-utilitarian; that time could be used to accomplish tasks which actually generate wellbeing.