Two sources of human misalignment that may resist a long reflection: malevolence and ideological fanaticism
(Alternative title: Some bad human values may resist idealization)
The values of some humans, even if idealized (e.g., during some form of long reflection), may be incompatible with an excellent future. Thus, solving AI alignment will not necessarily lead to utopia.
Others have raised similar concerns before. Joe Carlsmith puts it especially well in the post “An even deeper atheism”:
What makes human hearts bad?
What, exactly, makes some human hearts bad drivers? If we better understood what makes hearts go bad, perhaps we could figure out how to make bad hearts good or at least learn how to prevent hearts from going bad. It would also allow us better spot potentially bad hearts and coordinate our efforts to prevent them from taking the driving seat.
As of now, I’m most worried about malevolent personality traits and fanatical ideologies.
Malevolence: dangerous personality traits
Some human hearts may be corrupted due to elevated malevolent traits like psychopathy, sadism, narcissism, Machiavellianism, or spitefulness.
Ideological fanaticism: dangerous belief systems
There are many suitable definitions of “ideological fanaticism”. Whatever definition we are going to use, it should describe ideologies that have caused immense harm historically, such as fascism (Germany under Hitler, Italy under Mussolini), (extreme) communism (the Soviet Union under Stalin, China under Mao), religious fundamentalism (ISIS, the Inquisition), and most cults.
See this footnote for a preliminary list of defining characteristics.
Malevolence and fanaticism seem especially dangerous
Of course, there are other factors that could corrupt our hearts or driving ability. For example, cognitive biases, limited cognitive ability, philosophical confusions, or plain old selfishness. I’m most concerned about malevolence and ideological fanaticism for two reasons
Y-Combinator wants to fund Mechanistic Interpretability startups
"Understanding model behavior is very challenging, but we believe that in contexts where trust is paramount it is essential for an AI model to be interpretable. Its responses need to be explainable.
For society to reap the full benefits of AI, more work needs to be done on explainable AI. We are interested in funding people building new interpretable models or tools to explain the output of existing models."
https://www.ycombinator.com/rfs (Scroll to 12)
What they look for in startup founders
Mildly against the Longtermism --> GCR shift
Epistemic status: Pretty uncertain, somewhat rambly
TL;DR replacing longtermism with GCRs might get more resources to longtermist causes, but at the expense of non-GCR longtermist interventions and broader community epistemics
Over the last ~6 months I've noticed a general shift amongst EA orgs to focus less on reducing risks from AI, Bio, nukes, etc based on the logic of longtermism, and more based on Global Catastrophic Risks (GCRs) directly. Some data points on this:
* Open Phil renaming it's EA Community Growth (Longtermism) Team to GCR Capacity Building
* This post from Claire Zabel (OP)
* Giving What We Can's new Cause Area Fund being named "Risk and Resilience," with the goal of "Reducing Global Catastrophic Risks"
* Longview-GWWC's Longtermism Fund being renamed the "Emerging Challenges Fund"
* Anecdotal data from conversations with people working on GCRs / X-risk / Longtermist causes
My guess is these changes are (almost entirely) driven by PR concerns about longtermism. I would also guess these changes increase the number of people donation / working on GCRs, which is (by longtermist lights) a positive thing. After all, no-one wants a GCR, even if only thinking about people alive today.
Yet, I can't help but feel something is off about this framing. Some concerns (no particular ordering):
1. From a longtermist (~totalist classical utilitarian) perspective, there's a huge difference between ~99% and 100% of the population dying, if humanity recovers in the former case, but not the latter. Just looking at GCRs on their own mostly misses this nuance.
* (see Parfit Reasons and Persons for the full thought experiment)
2. From a longtermist (~totalist classical utilitarian) perspective, preventing a GCR doesn't differentiate between "humanity prevents GCRs and realises 1% of it's potential" and "humanity prevents GCRs realises 99% of its potential"
* Preventing an extinction-level GCR might move u
Not that we can do much about it, but I find the idea of Trump being president in a time that we're getting closer and closer to AGI pretty terrifying.
A second Trump term is going to have a lot more craziness and far fewer checks on his power, and I expect it would have significant effects on the global trajectory of AI.
(COI note: I work at OpenAI. These are my personal views, though.)
My quick take on the "AI pause debate", framed in terms of two scenarios for how the AI safety community might evolve over the coming years:
1. AI safety becomes the single community that's the most knowledgeable about cutting-edge ML systems. The smartest up-and-coming ML researchers find themselves constantly coming to AI safety spaces, because that's the place to go if you want to nerd out about the models. It feels like the early days of hacker culture. There's a constant flow of ideas and brainstorming in those spaces; the core alignment ideas are standard background knowledge for everyone there. There are hackathons where people build fun demos, and people figuring out ways of using AI to augment their research. Constant interactions with the models allows people to gain really good hands-on intuitions about how they work, which they leverage into doing great research that helps us actually understand them better. When the public ends up demanding regulation, there's a large pool of competent people who are broadly reasonable about the risks, and can slot into the relevant institutions and make them work well.
2. AI safety becomes much more similar to the environmentalist movement. It has broader reach, but alienates a lot of the most competent people in the relevant fields. ML researchers who find themselves in AI safety spaces are told they're "worse than Hitler" (which happened to a friend of mine). People get deontological about AI progress; some hesitate to pay for ChatGPT because it feels like they're contributing to the problem (another true story); others overemphasize the risks of existing models in order to whip up popular support. People are sucked into psychological doom spirals similar to how many environmentalists think about climate change: if you're not depressed then you obviously don't take it seriously enough. Just like environmentalists often block some of the most valua
Pandemic Prevention: All Nations Should Build Emergency Medical Stockpiles
All nations should have stockpiles of medical resources e.g. masks, PPE, multi-purpose medicines and therapeutics, and various vaccines (smallpox, H1N1, etc). At the slightest hint of danger, these resources should be distributed to every part of the country. There should be enough stock to protect the people for as long as is required to get resupplied.
The Australians have a national medical stockpile and they started distributing masks from it in January 2020 in response to the Covid-19 outbreak. The French used to have a national stockpile of masks, but they decided it would be more ‘efficient’ to get rid of it, reasoning that if there was an emergency, they could just buy masks from China. Sacre bleu!
Stockpiling is a general-purpose risk management technique which also works for other emergencies such as terrorism, fires, nuclear fallout, and war. If you want to survive in the long-run, you need to build stockpiles!
Longtermist shower thought: what if we had a campaign to install Far-UVC in poultry farms? Seems like it could:
1. Reduce a bunch of diseases in the birds, which is good for: a. the birds’ welfare; b. the workers’ welfare; c. Therefore maybe the farmers’ bottom line?; d. Preventing/suppressing human pandemics (eg avian flu)
2. Would hopefully drive down the cost curve of Far-UVC
3. May also generate safety data in chickens, which could be helpful for derisking it for humans
Insofar as one of the main obstacles is humans' concerns for health effects, this would at least only raise these for a small group of workers.
Load more (8/69)
After talking and working for some time with non-EA organisations in the AI Policy space, I believe that we need to give more credence to the here-and-now of AI safety policy as well to get the attention of policymakers and get our foot in the door. That also gives us space to collaborate with other think tanks and organisations outside of the x-risk space that are proactive and committed to AI policy. Right now, a lot of those people also see X-risks as being fringe and radical(and these are people who are supposed to be on our side).
Governments tend to move slowly, with due process, and in small increments(think, "We are going to first maybe do some risk monitoring, only then auditing"). Policymakers are only visionaries with horizons until the end of their terms(hmm, no surprise). Usually, broad strokes in policy require precedents of a similar size for it to be feasible within a policymakers' agenda and the Overton window.
Every group that comes to a policy meeting thinks that their agenda item is the most pressing because, by definition, most of the time, contacting and getting meetings with policymakers means that you are proactive and have done your homework.
I want to see more EAs respond to Public Voice Opportunities, for instance- something I rarely hear on the EA forum or via EA channels/material.