All posts

New & upvoted

February 2024

Research 24
Building effective altruism 22
AI safety 20
Community 20
Announcements and updates 20
Animal welfare 19
More

Frontpage Posts

184
· · 6m read
116
Tyner
· · 3m read
109
52
· · 10m read

Quick takes

I met Australia's Assistant Minister for Defence last Friday. I asked him to write an email to the Minister in charge of AI, asking him to establish an AI Safety Institute. He said he would. He also seemed on board with not having fully autonomous AI weaponry. All because I sent one email asking for a meeting + had said meeting.  Advocacy might be the lowest hanging fruit in AI Safety.
85
Linch
4mo
2
Going forwards, LTFF is likely to be a bit more stringent (~15-20%?[1] Not committing to the exact number) about approving mechanistic interpretability grants than in grants in other subareas of empirical AI Safety, particularly from junior applicants. Some assorted reasons (note that not all fund managers necessarily agree with each of them): * Relatively speaking, a high fraction of resources and support for mechanistic interpretability comes from other sources in the community other than LTFF; we view support for mech interp as less neglected within the community. * Outside of the existing community, mechanistic interpretability has become an increasingly "hot" field in mainstream academic ML; we think good work is fairly likely to come from non-AIS motivated people in the near future. Thus overall neglectedness is lower. * While we are excited about recent progress in mech interp (including some from LTFF grantees!), some of us are suspicious that even success stories in interpretability are that large a fraction of the success story for AGI Safety. * Some of us are worried about field-distorting effects of mech interp being oversold to junior researchers and other newcomers as necessary or sufficient for safe AGI. * A high percentage of our technical AIS applications are about mechanistic interpretability, and we want to encourage a diversity of attempts and research to tackle alignment and safety problems. We wanted to encourage people interested in working on technical AI safety to apply to us with proposals for projects in areas of empirical AI safety other than interpretability. To be clear, we are still excited about receiving mechanistic interpretability applications in the future, including from junior applicants. Even with a higher bar for approval, we are still excited about funding great grants. We tentatively plan on publishing a more detailed explanation about the reasoning later, as well as suggestions or a Request for Proposals for other promising research directions. However, these things often take longer than we expect/intend (and may not end up happening), so I wanted to give potential applicants a heads-up. (x-posted from LW) 1. ^ Operationalized as "assuming similar levels of funding in 2024 as in 2023, I expect that about 80-85% of the mech interp projects we funded in 2023 will be above the 2024 bar.
I'm curious why there hasn't been more work exploring a pro-AI or pro-AI-acceleration position from an effective altruist perspective. Some points: 1. Unlike existential risk from other sources (e.g. an asteroid) AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can't simply apply a naive argument that AI threatens total extinction of value to make the case that AI safety is astronomically important, in the sense that you can for other x-risks. You generally need additional assumptions. 2. Total utilitarianism is generally seen as non-speciesist, and therefore has no intrinsic preference for human values over unaligned AI values. If AIs are conscious, there don't appear to be strong prima facie reasons for preferring humans to AIs under hedonistic utilitarianism. Under preference utilitarianism, it doesn't necessarily matter whether AIs are conscious. 3. Total utilitarianism generally recommends large population sizes. Accelerating AI can be modeled as a kind of "population accelerationism". Extremely large AI populations could be preferable under utilitarianism compared to small human populations, even those with high per-capita incomes. Indeed, humans populations have recently stagnated via low population growth rates, and AI promises to lift this bottleneck.  4. Therefore, AI accelerationism seems straightforwardly recommended by total utilitarianism under some plausible theories. Here's a non-exhaustive list of guesses for why I think EAs haven't historically been sympathetic to arguments like the one above, and have instead generally advocated AI safety over AI acceleration (at least when these two values conflict): * A belief that AIs won't be conscious, and therefore won't have much moral value compared to humans. * But why would we assume AIs won't be conscious? For example, if Brian Tomasik is right, consciousness is somewhat universal, rather than being restricted to humans or members of the animal kingdom. * I also haven't actually seen much EA literature defend this assumption explicitly, which would be odd if this belief is the primary reason EAs have for focusing on AI safety over AI acceleration. * A presumption in favor of human values over unaligned AI values for some reasons that aren't based on strict impartial utilitarian arguments. These could include the beliefs that: (1) Humans are more likely to have "interesting" values compared to AIs, and (2) Humans are more likely to be motivated by moral arguments than AIs, and are more likely to reach a deliberative equilibrium of something like "ideal moral values" compared to AIs. * Why would humans be more likely to have "interesting" values than AIs? It seems very plausible that AIs will have interesting values even if their motives seem alien to us. AIs might have even more "interesting" values than humans. * It seems to me like wishful thinking to assume that humans are strongly motivated by moral arguments and would settle upon something like "ideal moral values" * A belief that population growth is inevitable, so it is better to focus on AI safety. * But a central question here is why pushing for AI safety—in the sense of AI research that enhances human interests—is better than the alternative on the margin. What reason is there to think AI safety now is better than pushing for greater AI population growth now? (Potential responses to this question are outlined in other bullet points above and below.) * AI safety has lasting effects due to a future value lock-in event, whereas accelerationism would have, at best, temporary effects. * Are you sure there will ever actually be a "value lock-in event"? * Even if there is at some point a value lock-in event, wouldn't pushing for accelerationism also plausibly affect the values that are locked in? For example, the value of "population growth is good" seems more likely to be locked in, if you advocate for that now. * A belief that humans would be kinder and more benevolent than unaligned AIs * Humans seem pretty bad already. For example, humans are responsible for factory farming. It's plausible that AIs could be even more callous and morally indifferent than humans, but the bar already seems low.  * I'm also not convinced that moral values will be a major force shaping "what happens to the cosmic endowment". It seems to me that the forces shaping economic consumption matter more than moral values. * A bedrock heuristic that it would be extraordinarily bad if "we all died from AI", and therefore we should pursue AI safety over AI accelerationism. * But it would also be bad if we all died from old age while waiting for AI, and missed out on all the benefits that AI offers to humans, which is a point in favor of acceleration. Why would this heuristic be weaker? * An adherence to person-affecting views in which the values of currently-existing humans are what matter most; and a belief that AI threatens to kill existing humans. * But in this view, AI accelerationism could easily be favored since AIs could greatly benefit existing humans by extending our lifespans and enriching our lives with advanced technology. * An implicit acceptance of human supremacism, i.e. the idea that what matters is propagating the interests of the human species, or preserving the human species, even at the expense of individual interests (either within humanity or outside humanity) or the interests of other species. * But isn't EA known for being unusually anti-speciesist compared to other communities? Peter Singer is often seen as a "founding father" of the movement, and a huge part of his ethical philosophy was about how we shouldn't be human supremacists. * More generally, it seems wrong to care about preserving the "human species" in an abstract sense relative to preserving the current generation of actually living humans. * A belief that most humans are biased towards acceleration over safety, and therefore it is better for EAs to focus on safety as a useful correction mechanism for society. * But was an anti-safety bias common for previous technologies? I think something closer to the opposite is probably true: most humans seem, if anything, biased towards being overly cautious about new technologies rather than overly optimistic. * A belief that society is massively underrating the potential for AI, which favors extra work on AI safety, since it's so neglected. * But if society is massively underrating AI, then this should also favor accelerating AI too? There doesn't seem to be an obvious asymmetry between these two values. * An adherence to negative utilitarianism, which would favor obstructing AI, along with any other technology that could enable the population of conscious minds to expand. * This seems like a plausible moral argument to me, but it doesn't seem like a very popular position among EAs. * A heuristic that "change is generally bad" and AI represents a gigantic change. * I don't think many EAs would defend this heuristic explicitly. * Added: AI represents a large change to the world. Delaying AI therefore preserves option value. * This heuristic seems like it would have favored advocating delaying the industrial revolution, and all sorts of moral, social, and technological changes to the world in the past. Is that a position that EAs would be willing to bite the bullet on?
EDIT: just confirmed that FHI shut down as of April 16, 2024 It sounds like the Future of Humanity Institute may be permanently shut down.  Background: FHI went on a hiring freeze/pause back in 2021 with the majority of staff leaving (many left with the spin-off of the Centre for the Governance of AI) and moved to other EA organizations. Since then there has been no public communication regarding its future return, until now...  The Director, Nick Bostrom, updated the bio section on his website with the following commentary [bolding mine]:  > "...Those were heady years. FHI was a unique place - extremely intellectually alive and creative - and remarkable progress was made. FHI was also quite fertile, spawning a number of academic offshoots, nonprofits, and foundations. It helped incubate the AI safety research field, the existential risk and rationalist communities, and the effective altruism movement. Ideas and concepts born within this small research center have since spread far and wide, and many of its alumni have gone on to important positions in other institutions. > > Today, there is a much broader base of support for the kind of work that FHI was set up to enable, and it has basically served out its purpose. (The local faculty administrative bureaucracy has also become increasingly stifling.) I think those who were there during its heyday will remember it fondly. I feel privileged to have been a part of it and to have worked with the many remarkable individuals who flocked around it." This language suggests that FHI has officially closed. Can anyone at Trajan/Oxford confirm?  Also curious if there is any project in place to conduct a post mortem on the impact FHI has had on the many different fields and movements? I think it's important to ensure that FHI is remembered as a significant nexus point for many influential ideas and people who may impact the long term.  In other news, Bostrom's new book "Deep Utopia" is available for pre-order (coming March 27th). 
Two sources of human misalignment that may resist a long reflection: malevolence and ideological fanaticism (Alternative title: Some bad human values may corrupt a long reflection[1]) The values of some humans, even if idealized (e.g., during some form of long reflection), may be incompatible with an excellent future. Thus, solving AI alignment will not necessarily lead to utopia. Others have raised similar concerns before.[2] Joe Carlsmith puts it especially well in the post “An even deeper atheism”: > “And now, of course, the question arises: how different, exactly, are human hearts from each other? And in particular: are they sufficiently different that, when they foom, and even "on reflection," they don't end up pointing in exactly the same direction? After all, Yudkowsky said, above, that in order for the future to be non-trivially "of worth," human hearts have to be in the driver's seat. But even setting aside the insult, here, to the dolphins, bonobos, nearest grabby aliens, and so on – still, that's only to specify a necessary condition. Presumably, though, it's not a sufficient condition? Presumably some human hearts would be bad drivers, too? Like, I dunno, Stalin?” What makes human hearts bad?  What, exactly, makes some human hearts bad drivers? If we better understood what makes hearts go bad, perhaps we could figure out how to make bad hearts good or at least learn how to prevent hearts from going bad. It would also allow us better spot potentially bad hearts and coordinate our efforts to prevent them from taking the driving seat. As of now, I’m most worried about malevolent personality traits and fanatical ideologies.[3] Malevolence: dangerous personality traits Some human hearts may be corrupted due to elevated malevolent traits like psychopathy, sadism, narcissism, Machiavellianism, or spitefulness. Ideological fanaticism: dangerous belief systems There are many suitable definitions of “ideological fanaticism”. Whatever definition we are going to use, it should describe ideologies that have caused immense harm historically, such as fascism (Germany under Hitler, Italy under Mussolini), (extreme) communism (the Soviet Union under Stalin, China under Mao), religious fundamentalism (ISIS, the Inquisition), and most cults.  See this footnote[4] for a preliminary list of defining characteristics. Malevolence and fanaticism seem especially dangerous Of course, there are other factors that could corrupt our hearts or driving ability. For example, cognitive biases, limited cognitive ability, philosophical confusions, or plain old selfishness.[5] I’m most concerned about malevolence and ideological fanaticism for two reasons.    Deliberately resisting reflection and idealization First, malevolence—if reflectively endorsed[6]—and fanatical ideologies deliberately resist being changed and would thus plausibly resist idealization even during a long reflection. The most central characteristic of fanatical ideologies is arguably that they explicitly forbid criticism, questioning, and belief change and view doubters and disagreement as evil.  Putting positive value on creating harm Second, malevolence and ideological fanaticism would not only result in the future not being as good as it possibly could—they might actively steer the future in bad directions and, for instance, result in astronomical amounts of suffering.  The preferences of malevolent humans (e.g., sadists) may be such that they intrinsically enjoy inflicting suffering on others. Similarly, many fanatical ideologies sympathize with excessive retributivism and often demonize the outgroup. Enabled by future technology, preferences for inflicting suffering on the outgroup may result in enormous disvalue—cf. concentration camps, the Gulag, or hell[7]. In the future, I hope to write more about all of this, especially long-term risks from ideological fanaticism.  Thanks to Pablo and Ruairi for comments and valuable discussions.  1. ^ “Human misalignment” is arguably a confusing (and perhaps confused) term. But it sounds more sophisticated than “bad human values”.  2. ^ For example, Matthew Barnett in “AI alignment shouldn't be conflated with AI moral achievement”, Geoffrey Miller in “AI alignment with humans... but with which humans?”, lc in “Aligned AI is dual use technology”. Pablo Stafforini has called this the “third alignment problem”. And of course, Yudkowsky’s concept of CEV is meant to address these issues.  3. ^ These factors may not be clearly separable. Some humans may be more attracted to fanatical ideologies due to their psychological traits and malevolent humans are often leading fanatical ideologies. Also, believing and following a fanatical ideology may not be good for your heart. 4. ^ Below are some typical characteristics (I’m no expert in this area): Unquestioning belief, absolute certainty and rigid adherence. The principles and beliefs of the ideology are seen as absolute truth and questioning or critical examination is forbidden. Inflexibility and refusal to compromise.  Intolerance and hostility towards dissent. Anyone who disagrees or challenges the ideology is seen as evil; as enemies, traitors, or heretics. Ingroup superiority and outgroup demonization. The in-group is viewed as superior, chosen, or enlightened. The out-group is often demonized and blamed for the world's problems.  Authoritarianism. Fanatical ideologies often endorse (or even require) a strong, centralized authority to enforce their principles and suppress opposition, potentially culminating in dictatorship or totalitarianism. Militancy and willingness to use violence. Utopian vision. Many fanatical ideologies are driven by a vision of a perfect future or afterlife which can only be achieved through strict adherence to the ideology. This utopian vision often justifies extreme measures in the present.  Use of propaganda and censorship.  5. ^ For example, Barnett argues that future technology will be primarily used to satisfy economic consumption (aka selfish desires). That seems even plausible to me, however, I’m not that concerned about this causing huge amounts of future suffering (at least compared to other s-risks). It seems to me that most humans place non-trivial value on the welfare of (neutral) others such as animals. Right now, this preference (for most people) isn’t strong enough to outweigh the selfish benefits of eating meat. However, I’m relatively hopeful that future technology would make such types of tradeoffs much less costly.  6. ^ Some people (how many?) with elevated malevolent traits don’t reflectively endorse their malevolent urges and would change them if they could. However, some of them do reflectively endorse their malevolent preferences and view empathy as weakness.  7. ^ Some quotes from famous Christian theologians:  Thomas Aquinas:  "the blessed will rejoice in the punishment of the wicked." "In order that the happiness of the saints may be more delightful to them and that they may render more copious thanks to God for it, they are allowed to see perfectly the sufferings of the damned".  Samuel Hopkins:  "Should the fire of this eternal punishment cease, it would in a great measure obscure the light of heaven, and put an end to a great part of the happiness and glory of the blessed.” Jonathan Edwards:  "The sight of hell torments will exalt the happiness of the saints forever."