All posts

Top

Week of Sunday, 26 May 2024
Week of Sun, 26 May 2024

Frontpage Posts

Quick takes

EA organizations frequently ask for people to run criticism by them ahead of time. I’ve been wary of the push for this norm. My big concerns were that orgs wouldn’t comment until a post was nearly done, and that it would take a lot of time. My recent post  mentioned a lot of people and organizations, so it seemed like useful data. I reached out to 12 email addresses, plus one person in FB DMs and one open call for information on a particular topic.  This doesn’t quite match what you see in the post because some people/orgs were used more than once, and other mentions were cut. The post was in a fairly crude state when I sent it out. Of those 14: 10 had replied by the start of next day. More than half of those replied within a few hours. I expect this was faster than usual because no one had more than a few paragraphs relevant to them or their org, but is still impressive. It’s hard to say how sending an early draft changed things. One person got some extra anxiety because their paragraph was full of TODOs (because it was positive and I hadn’t worked as hard fleshing out the positive mentions ahead of time). I could maybe have saved myself one stressful interaction if I’d realized I was going to cut an example ahead of time Only 80,000 Hours, Anima International, and GiveDirectly failed to respond before publication (7 days after I emailed them). Of those, only 80k's mention was negative. I didn’t keep as close track of changes, but at a minimum replies led to 2 examples being removed entirely, 2 clarifications and some additional information that made the post better. So overall I'm very glad I solicited comments, and found the process easier than expected. 
I highly recommend the book "How to Launch A High-Impact Nonprofit" to everyone. I've been EtG for many years and I thought this book wasn't relevant to me, but I'm learning a lot and I'm really enjoying it.
2
Emrik
10h
1
If evolutionary biology metaphors for social epistemology is your cup of tea, you may find this discussion I had with ChatGPT interesting. 🍵 (Also, sorry for not optimizing this; but I rarely find time to write anything publishable, so I thought just sharing as-is was better than not sharing at all. I recommend the footnotes btw!) Glossary/metaphors * Howea palm trees ↦ EA community * Wind-pollination ↦ "panmictic communication" * Sympatric speciation ↦ horizontal segmentation * Ecological niches ↦ "epistemic niches" * Inbreeding depression ↦ echo chambers * Outbreeding depression (and Baker's law) ↦ "Zollman-like effects" * At least sorta. There's a host of mechanisms mostly sharing the same domain and effects with the more precisely-defined Zollman effect, and I'm saying "Zollman-like" to refer to the group of them. Probably I should find a better word. Background Once upon a time, the common ancestor of the palm trees Howea forsteriana and Howea belmoreana on Howe Island would pollinate each other more or less uniformly during each flowering cycle. This was "panmictic" because everybody was equally likely to mix with anybody else. Then, on a beautifwl sunny morning smack in the middle of New Zealand and Australia, the counterfactual descendants had had enough. Due to varying soil profiles on the island, they all had to compromise between fitness for each soil type—or purely specialize in one and accept the loss of all seeds which landed on the wrong soil. "This seems inefficient," one of them observed. A few of them nodded in agreement and conspired to gradually desynchronize their flowering intervals from their conspecifics, so that they would primarily pollinate each other rather than having to uniformly mix with everybody. They had created a cline. And a cline once established, permits the gene pools of the assortatively-pollinating palms to further specialize toward different mesa-niches within their original meta-niche. Given that a crossbreed between palms adapted for different soil types is going to be less adaptive for either niche,[1] you have a positive feedback cycle where they increasingly desynchronize (to minimize crossbreeding) and increasingly specialize. Solve for the general equilibrium and you get sympatric speciation.[2] Notice that their freedom to specialize toward their respective mesa-niches is proportional to their reproductive isolation (or inversely proportional to the gene flow between them). The more panmictic they are, the more selection-pressure there is on them to retain 1) genetic performance across the population-weighted distribution of all the mesa-niches in the environment, and 2) cross-compatibility with the entire population (since you can't choose your mates if you're a wind-pollinating palm tree).[3] From evo bio to socioepistemology > I love this as a metaphor for social epistemology, and the potential detrimental effects of "panmictic communication". Sorta related to the Zollman effect, but more general. If you have an epistemic community that are trying to grow knowledge about a range of different "epistemic niches", then widespread pollination (communication) is obviously good because it protects against e.g. inbreeding depression of local subgroups (e.g. echo chambers, groupthink, etc.), and because researchers can coordinate to avoid redundant work, and because ideas tend to inspire other ideas; but it can also be detrimental because researchers who try to keep up with the ideas and technical jargon being developed across the community (especially related to everything that becomes a "hot topic") will have less time and relative curiosity to specialize in their focus area ("outbreeding depression"). > > A particularly good example of this is the effective altruism community. Given that they aspire to prioritize between all the world's problems, and due to the very high-dimensional search space generalized altruism implies, and due to how tight-knit the community's discussion fora are (the EA forum, LessWrong, EAGs, etc.), they tend to learn an extremely wide range of topics. I think this is awesome, and usually produces better results than narrow academic fields, but nonetheless there's a tradeoff here. > > The rather untargeted gene-flow implied by wind-pollination is a good match to mostly-online meme-flow of the EA community. You might think that EAs will adequately speciate and evolve toward subniches due to the intractability of keeping up with everything, and indeed there are many subcommunities that branch into different focus areas. But if you take cognitive biases into account, and the constant desire people have to be *relevant* to the largest audience they can find (preferential attachment wrt hot topics), plus fear-of-missing-out, and fear of being "caught unaware" of some newly-developed jargon (causing people to spend time learning everything that risks being mentioned in live conversations[4]), it's unlikely that they couldn't benefit from smarter and more fractal ways to specialize their niches. Part of that may involve more "horizontally segmented" communication. Tagging @Holly_Elmore because evobio metaphors is definitely your cup of tea, and a lot of it is inspired by stuff I first learned from you. Thanks! : ) 1. ^ Think of it like... if you're programming something based on the assumption that it will run on Linux xor Windows, it's gonna be much easier to reach a given level of quality compared to if you require it to be cross-compatible. 2. ^ Sympatric speciation is rare because the pressure to be compatible with your conspecifics is usually quite high (Allee effects ↦ network effects). But it is still possible once selection-pressures from "disruptive selection" exceed the "heritage threshold" relative to each mesa-niche.[5] 3. ^ This homegenification of evolutionary selection-pressures is akin to markets converging to an equilibrium price. It too depends on panmixia of customers and sellers for a given product. If customers are able to buy from anybody anywhere, differential pricing (i.e. trying to sell your product at above or below equilibrium price for a subgroup of customers) becomes impossible. 4. ^ This is also known (by me and at least one other person...) as the "jabber loop": > This highlight the utter absurdity of being afraid of having our ignorance exposed, and going 'round judging each other for what we don't know. If we all worry overmuch about what we don't know, we'll all get stuck reading and talking about stuff in the Jabber loop. The more of our collective time we give to the Jabber loop, the more unusual it will be to be ignorant of what's in there, which means the social punishments for Jabber-ignorance will get even harsher. 5. ^ To take this up a notch: sympatric speciation occurs when a cline in the population extends across a separatrix (red) in the dynamic landscape, and the attractors (blue) on each side overpower the cohering forces from Allee effects (orange). This is the doodle I drew on a post-it note to illustrate that pattern in different context: I dub him the mascot of bullshit-math. Isn't he pretty?
Very quick thoughts on setting time aside for strategy, planning and implementation, since I'm into my 4th week of strategy development and experiencing intrusive thoughts about needing to hurry up on implementation; * I have a 52 week LTFF grant to do movement building in Australia (AI Safety) * I have set aside 4.5 weeks for research (interviews + landscape review + maybe survey) and strategy development (segmentation, targeting, positioning), * Then 1.5 weeks for planning (content, events, educational programs), during which I will get feedback from others on the plan and then iterate it.  * This leaves me with 46/52 weeks to implement ruthlessly. In conclusion, 6 weeks on strategy and planning seems about right. 2 weeks would have been too short, 10 weeks would have been too long, this porridge is juuuussttt rightttt. keen for feedback from people in similar positions.
How useful is pre university student collations of research papers in biorisk? I've been working on some papers (for fun) collating research in the biosafety field, but obviously have no experience/degrees and it is secondary analysis- how useful would posting these 'rough' papers be helpful. They mainly focus on antibiotic resistance, biosafety and pandemic risk from gain of function research?

Week of Sunday, 19 May 2024
Week of Sun, 19 May 2024

Frontpage Posts

Quick takes

50
Linch
7d
8
Do we know if @Paul_Christiano or other ex-lab people working on AI policy have non-disparagement agreements with OpenAI or other AI companies? I know Cullen doesn't, but I don't know about anybody else. I know NIST isn't a regulatory body, but it still seems like standards-setting should be done by people who have no unusual legal obligations. And of course, some other people are or will be working at regulatory bodies, which may have more teeth in the future. To be clear, I want to differentiate between Non-Disclosure Agreements, which are perfectly sane and reasonable in at least a limited form as a way to prevent leaking trade secrets, and non-disparagement agreements, which prevents you from saying bad things about past employers. The latter seems clearly bad to have for anybody in a position to affect policy. Doubly so if the existence of the non-disparagement agreement itself is secretive.
Having a baby and becoming a parent has had an incredible impact on me. Now more than ever, I feel more connected and concerned about the wellbeing of others. I feel as though my heart has literally grown. I wanted to share this as I expect there are many others who are questioning whether to have children -- perhaps due to concerns about it limiting their positive impact, among many others. But I'm just here to say it's been beautiful, and amazing, and I look forward to the day I get to talk with my son about giving back in a meaningful way.  
I wonder how the recent turn for the worse at OpenAI should make us feel about e.g. Anthropic and Conjecture and other organizations with a similar structure, or whether we should change our behaviour towards those orgs. * How much do we think that OpenAI's problems are idiosyncratic vs. structural? If e.g. Sam Altman is the problem, we can still feel good about peer organisations. If instead weighing investor concerns and safety concerns is the root of the problem, we should be worried about whether peer organizations are going to be pushed down the same path sooner or later. * Are there any concerns we have with OpenAI that we should be taking this opportunity to put to its peers as well? For example, have peers been publically asked if they use non-disparagement agreements? I can imagine a situation where another org has really just never thought to use them, and we can use this occasion to encourage them to turn that into a public commitment.
Besides Ilya Sutskever, is there any person not related to the EA community who quit or was fired from OpenAI for safety concerns?
I don't think CEA has a public theory of change, it just has a strategy. If I were to recreate its theory of change based on what I know of the org, it'd have three target groups: 1. Non-EAs 2. Organisers 3. Existing members of the community Per target group, I'd say it has the following main activities: * Targeting non-EAs, it does comms and education (the VP programme). * Targeting organisers, you have the work of the groups team. * Targeting existing members, you have the events team, the forum team, and community health.  Per target group, these activities are aiming for the following short-term outcomes: * Targeting non-EAs, it doesn't aim to raise awareness of EA, but instead, it aims to ensure people have an accurate understanding of what EA is. * Targeting organisers, it aims to improve their ability to organise. * Targeting existing members, it aims to improve information flow (through EAG(x) events, the forum, newsletters, etc.) and maintain a healthy culture (through community health work). If you're interested, you can see EA Netherland's theory of change here. 

Week of Sunday, 12 May 2024
Week of Sun, 12 May 2024

Frontpage Posts

109
· · 25m read

Quick takes

134
Cullen
10d
0
I am not under any non-disparagement obligations to OpenAI. It is important to me that people know this, so that they can trust any future policy analysis or opinions I offer. I have no further comments at this time.
This is a cold take that’s probably been said before, but I thought it bears repeating occasionally, if only for the reminder: The longtermist viewpoint has gotten a lot of criticism for prioritizing “vast hypothetical future populations” over the needs of "real people," alive today. The mistake, so the critique goes, is the result of replacing ethics with math, or utilitarianism, or something cold and rigid like that. And so it’s flawed because it lacks the love or duty or "ethics of care" or concern for justice that lead people to alternatives like mutual aid and political activism. My go-to reaction to this critique has become something like “well you don’t need to prioritize vast abstract future generations to care about pandemics or nuclear war, those are very real things that could, with non-trivial probability, face us in our lifetimes.” I think this response has taken hold in general among people who talk about X-risk. This probably makes sense for pragmatic reasons. It’s a very good rebuttal to the “cold and heartless utilitarianism/pascal's mugging” critique. But I think it unfortunately neglects the critical point that longtermism, when taken really seriously — at least the sort of longtermism that MacAskill writes about in WWOTF, or Joe Carlsmith writes about in his essays — is full of care and love and duty. Reading the thought experiment that opens the book about living every human life in sequential order reminded me of this. I wish there were more people responding to the “longtermism is cold and heartless” critique by making the case that no, longtermism at face value is worth preserving because it's the polar opposite of heartless. Caring about the world we leave for the real people, with emotions and needs and experiences as real as our own, who very well may inherit our world but who we’ll never meet, is an extraordinary act of empathy and compassion — one that’s way harder to access than the empathy and warmth we might feel for our neighbors by default. It’s the ultimate act of care. And it’s definitely concerned with justice. (I mean, you can also find longtermism worthy because of something something math and cold utilitarianism. That’s not out of the question. I just don’t think it’s the only way to reach that conclusion.)
We should expect that the incentives and culture for AI-focused companies to make them uniquely terrible for producing safe AGI.    From a “safety from catastrophic risk” perspective, I suspect an “AI-focused company” (e.g. Anthropic, OpenAI, Mistral) is abstractly pretty close to the worst possible organizational structure for getting us towards AGI. I have two distinct but related reasons: 1. Incentives 2. Culture From an incentives perspective, consider realistic alternative organizational structures to “AI-focused company” that nonetheless has enough firepower to host successful multibillion-dollar scientific/engineering projects: 1. As part of an intergovernmental effort (e.g. CERN’s Large Hadron Collider, the ISS) 2. As part of a governmental effort of a single country (e.g. Apollo Program, Manhattan Project, China’s Tiangong) 3. As part of a larger company (e.g. Google DeepMind, Meta AI) In each of those cases, I claim that there are stronger (though still not ideal) organizational incentives to slow down, pause/stop, or roll back deployment if there is sufficient evidence or reason to believe that further development can result in major catastrophe. In contrast, an AI-focused company has every incentive to go ahead on AI when the case for pausing is uncertain, and minimal incentive to stop or even take things slowly.  From a culture perspective, I claim that without knowing any details of the specific companies, you should expect AI-focused companies to be more likely than plausible contenders to have the following cultural elements: 1. Ideological AGI Vision AI-focused companies may have a large contingent of “true believers” who are ideologically motivated to make AGI at all costs and 2. No Pre-existing Safety Culture AI-focused companies may have minimal or no strong “safety” culture where people deeply understand, have experience in, and are motivated by a desire to avoid catastrophic outcomes.  The first one should be self-explanatory. The second one is a bit more complicated, but basically I think it’s hard to have a safety-focused culture just by “wanting it” hard enough in the abstract, or by talking a big game. Instead, institutions (relatively) have more of a safe & robust culture if they have previously suffered the (large) costs of not focusing enough on safety. For example, engineers who aren’t software engineers understand fairly deep down that their mistakes can kill people, and that their predecessors’ fuck-up have indeed killed people (think bridges collapsing, airplanes falling, medicines not working, etc). Software engineers rarely have such experience. Similarly, governmental institutions have institutional memories with the problems of major historical fuckups, in a way that new startups very much don’t.
[PHOTO] I sent 19 emails to politicians, had 4 meetings, and now I get emails like this. There is SO MUCH low hanging fruit in just doing this for 30 minutes a day (I would do it but my LTFF funding does not cover this). Someone should do this!
We’re very excited to announce the following speakers for EA Global: London 2024: * Rory Stewart (Former MP, Host of The Rest is Politics podcast and Senior Advisor to GiveDirectly) on obstacles and opportunities in making aid agencies more effective. * Mary Phuong (Research Scientist at DeepMind) on dangerous capability evaluations and responsible scaling. * Mahi Klosterhalfen (CEO of the Albert Schweitzer Foundation) on combining interventions for maximum impact in farmed animal welfare. Applications close 19 May. Apply here and find more details on our website, you can also email the EA Global team at hello@eaglobal.org if you have any questions.

Week of Sunday, 5 May 2024
Week of Sun, 5 May 2024

Frontpage Posts

48
· · 11m read

Quick takes

58
OllieBase
17d
0
Congratulations to the EA Project For Awesome 2024 team, who managed to raise over $100k for AMF, GiveDirectly and ProVeg International by submitting promotional/informational videos to the project. There's been an effort to raise money for effective charities via Project For Awesome since 2017, and it seems like a really productive effort every time. Thanks to all involved! 
FAQ: “Ways the world is getting better” banner The banner will only be visible on desktop. If you can't see it, try expanding your window. It'll be up for a week.  How do I use the banner? 1. Click on an empty space to add an emoji,  2. Choose your emoji,  3. Write a one-sentence description of the good news you want to share,  4. Link an article or forum post that gives more information.  If you’d like to delete your entry, click the cross that appears when you hover over it. It will be deleted for everyone. What kind of stuff should I write? Anything that qualifies as good news relevant to the world's most important problems.  For example, Ben West’s recent quick takes (1, 2, 3). Avoid posting partisan political news, but the passage of relevant bills and policies is on topic.  Will my entry be anonymous? All submissions are displayed without your Forum name, so they are ~anonymous to users, however, usual moderation norms still apply (additionally, we may remove duplicates or borderline trollish submissions. This is an experiment, so we reserve the right to moderate heavily if necessary). Ask any other questions you have in the comments below. Feel free to dm me with feedback or comments.  
This could be a long slog but I think it could be valuable to identify the top ~100 OS libraries and identify their level of resourcing to avoid future attacks like the XZ attack. In general, I think work on hardening systems is an underrated aspect of defending against future highly capable autonomous AI agents.
Common prevalence estimates are often wrong. Example: snakebites and my experience reading Long Covid literature. Both institutions like the WHO and academic literature appear to be incentivized to exaggerate. I think the Global Burden of Disease might be a more reliable source, but have not looked into it. I advise everyone using prevalence estimates to treat them with some skepticism and look up the source.
In my latest post I talked about whether unaligned AIs would produce more or less utilitarian value than aligned AIs. To be honest, I'm still quite confused about why many people seem to disagree with the view I expressed, and I'm interested in engaging more to get a better understanding of their perspective. At the least, I thought I'd write a bit more about my thoughts here, and clarify my own views on the matter, in case anyone is interested in trying to understand my perspective. The core thesis that was trying to defend is the following view: My view: It is likely that by default, unaligned AIs—AIs that humans are likely to actually build if we do not completely solve key technical alignment problems—will produce comparable utilitarian value compared to humans, both directly (by being conscious themselves) and indirectly (via their impacts on the world). This is because unaligned AIs will likely both be conscious in a morally relevant sense, and they will likely share human moral concepts, since they will be trained on human data. Some people seem to merely disagree with my view that unaligned AIs are likely to be conscious in a morally relevant sense. And a few others have a semantic disagreement with me in which they define AI alignment in moral terms, rather than the ability to make an AI share the preferences of the AI's operator.  But beyond these two objections, which I feel I understand fairly well, there's also significant disagreement about other questions. Based on my discussions, I've attempted to distill the following counterargument to my thesis, which I fully acknowledge does not capture everyone's views on this subject: Perceived counter-argument: The vast majority of utilitarian value in the future will come from agents with explicitly utilitarian preferences, rather than those who incidentally achieve utilitarian objectives. At present, only a small proportion of humanity holds partly utilitarian views. However, as unaligned AIs will differ from humans across numerous dimensions, it is plausible that they will possess negligible utilitarian impulses, in stark contrast to humanity's modest (but non-negligible) utilitarian tendencies. As a result, it is plausible that almost all value would be lost, from a utilitarian perspective, if AIs were unaligned with human preferences. Again, I'm not sure if this summary accurately represents what people believe. However, it's what some seem to be saying. I personally think this argument is weak. But I feel I've had trouble making my views very clear on this subject, so I thought I'd try one more time to explain where I'm coming from here. Let me respond to the two main parts of the argument in some amount of detail: (i) "The vast majority of utilitarian value in the future will come from agents with explicitly utilitarian preferences, rather than those who incidentally achieve utilitarian objectives." My response: I am skeptical of the notion that the bulk of future utilitarian value will originate from agents with explicitly utilitarian preferences. This clearly does not reflect our current world, where the primary sources of happiness and suffering are not the result of deliberate utilitarian planning. Moreover, I do not see compelling theoretical grounds to anticipate a major shift in this regard. I think the intuition behind the argument here is something like this: In the future, it will become possible to create "hedonium"—matter that is optimized to generate the maximum amount of utility or well-being. If hedonium can be created, it would likely be vastly more important than anything else in the universe in terms of its capacity to generate positive utilitarian value. The key assumption is that hedonium would primarily be created by agents who have at least some explicit utilitarian goals, even if those goals are fairly weak. Given the astronomical value that hedonium could potentially generate, even a tiny fraction of the universe's resources being dedicated to hedonium production could outweigh all other sources of happiness and suffering. Therefore, if unaligned AIs would be less likely to produce hedonium than aligned AIs (due to not having explicitly utilitarian goals), this would be a major reason to prefer aligned AI, even if unaligned AIs would otherwise generate comparable levels of value to aligned AIs in all other respects. If this is indeed the intuition driving the argument, I think it falls short for a straightforward reason. The creation of matter-optimized-for-happiness is more likely to be driven by the far more common motives of self-interest and concern for one's inner circle (friends, family, tribe, etc.) than by explicit utilitarian goals. If unaligned AIs are conscious, they would presumably have ample motives to optimize for positive states of consciousness, even if not for explicitly utilitarian reasons. In other words, agents optimizing for their own happiness, or the happiness of those they care about, seem likely to be the primary force behind the creation of hedonium-like structures. They may not frame it in utilitarian terms, but they will still be striving to maximize happiness and well-being for themselves and others they care about regardless. And it seems natural to assume that, with advanced technology, they would optimize pretty hard for their own happiness and well-being, just as a utilitarian might optimize hard for happiness when creating hedonium. In contrast to the number of agents optimizing for their own happiness, the number of agents explicitly motivated by utilitarian concerns is likely to be much smaller. Yet both forms of happiness will presumably be heavily optimized. So even if explicit utilitarians are more likely to pursue hedonium per se, their impact would likely be dwarfed by the efforts of the much larger group of agents driven by more personal motives for happiness-optimization. Since both groups would be optimizing for happiness, the fact that hedonium is similarly optimized for happiness doesn't seem to provide much reason to think that it would outweigh the utilitarian value of more mundane, and far more common, forms of utility-optimization. To be clear, I think it's totally possible that there's something about this argument that I'm missing here. And there are a lot of potential objections I'm skipping over here. But on a basic level, I mostly just lack the intuition that the thing we should care about, from a utilitarian perspective, is the existence of explicit utilitarians in the future, for the aforementioned reasons. The fact that our current world isn't well described by the idea that what matters most is the number of explicit utilitarians, strengthens my point here. (ii) "At present, only a small proportion of humanity holds partly utilitarian views. However, as unaligned AIs will differ from humans across numerous dimensions, it is plausible that they will possess negligible utilitarian impulses, in stark contrast to humanity's modest (but non-negligible) utilitarian tendencies." My response: Since only a small portion of humanity is explicitly utilitarian, the argument's own logic suggests that there is significant potential for AIs to be even more utilitarian than humans, given the relatively low bar set by humanity's limited utilitarian impulses. While I agree we shouldn't assume AIs will be more utilitarian than humans without specific reasons to believe so, it seems entirely plausible that factors like selection pressures for altruism could lead to this outcome. Indeed, commercial AIs seem to be selected to be nice and helpful to users, which (at least superficially) seems "more utilitarian" than the default (primarily selfish-oriented) impulses of most humans. The fact that humans are only slightly utilitarian should mean that even small forces could cause AIs to exceed human levels of utilitarianism. Moreover, as I've said previously, it's probable that unaligned AIs will possess morally relevant consciousness, at least in part due to the sophistication of their cognitive processes. They are also likely to absorb and reflect human moral concepts as a result of being trained on human-generated data. Crucially, I expect these traits to emerge even if the AIs do not share human preferences.  To see where I'm coming from, consider how humans routinely are "misaligned" with each other, in the sense of not sharing each other's preferences, and yet still share moral concepts and a common culture. For example, an employee can share moral concepts with their employer while having very different consumption preferences from them. This picture is pretty much how I think we should primarily think about unaligned AIs that are trained on human data, and shaped heavily by techniques like RLHF or DPO. Given these considerations, I find it unlikely that unaligned AIs would completely lack any utilitarian impulses whatsoever. However, I do agree that even a small risk of this outcome is worth taking seriously. I'm simply skeptical that such low-probability scenarios should be the primary factor in assessing the value of AI alignment research. Intuitively, I would expect the arguments for prioritizing alignment to be more clear-cut and compelling than "if we fail to align AIs, then there's a small chance that these unaligned AIs might have zero utilitarian value, so we should make sure AIs are aligned instead". If low probability scenarios are the strongest considerations in favor of alignment, that seems to undermine the robustness of the case for prioritizing this work. While it's appropriate to consider even low-probability risks when the stakes are high, I'm doubtful that small probabilities should be the dominant consideration in this context. I think the core reasons for focusing on alignment should probably be more straightforward and less reliant on complicated chains of logic than this type of argument suggests. In particular, as I've said before, I think it's quite reasonable to think that we should align AIs to humans for the sake of humans. In other words, I think it's perfectly reasonable to admit that solving AI alignment might be a great thing to ensure human flourishing in particular. But if you're a utilitarian, and not particularly attached to human preferences per se (i.e., you're non-speciesist), I don't think you should be highly confident that an unaligned AI-driven future would be much worse than an aligned one, from that perspective.

Load more weeks