New & upvoted

Customize feedCustomize feed
NEW
CommunityCommunity
Personal+

Posts tagged community

Quick takes

Show community
View more
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source "transformer debugger" tool. I resigned from OpenAI on February 15, 2024.
Edgy and data-driven TED talk on how the older generations in America are undermining the youth. Worth a watch.  
In my latest post I talked about whether unaligned AIs would produce more or less utilitarian value than aligned AIs. To be honest, I'm still quite confused about why many people seem to disagree with the view I expressed, and I'm interested in engaging more to get a better understanding of their perspective. At the least, I thought I'd write a bit more about my thoughts here, and clarify my own views on the matter, in case anyone is interested in trying to understand my perspective. The core thesis that was trying to defend is the following view: My view: It is likely that by default, unaligned AIs—AIs that humans are likely to actually build if we do not completely solve key technical alignment problems—will produce comparable utilitarian value compared to humans, both directly (by being conscious themselves) and indirectly (via their impacts on the world). This is because unaligned AIs will likely both be conscious in a morally relevant sense, and they will likely share human moral concepts, since they will be trained on human data. Some people seem to merely disagree with my view that unaligned AIs are likely to be conscious in a morally relevant sense. And a few others have a semantic disagreement with me in which they define AI alignment in moral terms, rather than the ability to make an AI share the preferences of the AI's operator.  But beyond these two objections, which I feel I understand fairly well, there's also significant disagreement about other questions. Based on my discussions, I've attempted to distill the following counterargument to my thesis, which I fully acknowledge does not capture everyone's views on this subject: Perceived counter-argument: The vast majority of utilitarian value in the future will come from agents with explicitly utilitarian preferences, rather than those who incidentally achieve utilitarian objectives. At present, only a small proportion of humanity holds slightly utilitarian views. However, as unaligned AIs will differ from humans across numerous dimensions, it is plausible that they will possess negligible utilitarian impulses, in stark contrast to humanity's modest (but non-negligible) utilitarian tendencies. As a result, it is plausible that almost all value would be lost, from a utilitarian perspective, if AIs were unaligned with human preferences. Again, I'm not sure if this summary accurately represents what people believe. However, it's what some seem to be saying. I personally think this argument is weak. But I feel I've had trouble making my views very clear on this subject, so I thought I'd try one more time to explain where I'm coming from here. Let me respond to the two main parts of the argument in some amount of detail: (i) "The vast majority of utilitarian value in the future will come from agents with explicitly utilitarian preferences, rather than those who incidentally achieve utilitarian objectives." My response: I am skeptical of the notion that the bulk of future utilitarian value will originate from agents with explicitly utilitarian preferences. This clearly does not reflect our current world, where the primary sources of happiness and suffering are not the result of deliberate utilitarian planning. Moreover, I do not see compelling theoretical grounds to anticipate a major shift in this regard. I think the intuition behind the argument here is something like this: In the future, it will become possible to create "hedonium"—matter that is optimized to generate the maximum amount of utility or well-being. If hedonium can be created, it would likely be vastly more important than anything else in the universe in terms of its capacity to generate positive utilitarian value. The key assumption is that hedonium would primarily be created by agents who have at least some explicit utilitarian goals, even if those goals are fairly weak. Given the astronomical value that hedonium could potentially generate, even a tiny fraction of the universe's resources being dedicated to hedonium production could outweigh all other sources of happiness and suffering. Therefore, if unaligned AIs would be less likely to produce hedonium than aligned AIs (due to not having explicitly utilitarian goals), this would be a major reason to prefer aligned AI, even if unaligned AIs would otherwise generate comparable levels of value to aligned AIs in all other respects. If this is indeed the intuition driving the argument, I think it falls short for a straightforward reason. The creation of matter-optimized-for-happiness is more likely to be driven by the far more common motives of self-interest and concern for one's inner circle (friends, family, tribe, etc.) than by explicit utilitarian goals. If unaligned AIs are conscious, they would presumably have ample motives to optimize for positive states of consciousness, even if not for explicitly utilitarian reasons. In other words, agents optimizing for their own happiness, or the happiness of those they care about, seem likely to be the primary force behind the creation of hedonium-like structures. They may not frame it in utilitarian terms, but they will still be striving to maximize happiness and well-being for themselves and others they care about regardless. And it seems natural to assume that, with advanced technology, they would optimize pretty hard for their own happiness and well-being, just as a utilitarian might optimize hard for happiness when creating hedonium. In contrast to the number of agents optimizing for their own happiness, the number of agents explicitly motivated by utilitarian concerns is likely to be much smaller. Yet both forms of happiness will presumably be heavily optimized. So even if explicit utilitarians are more likely to pursue hedonium per se, their impact would likely be dwarfed by the efforts of the much larger group of agents driven by more personal motives for happiness-optimization. Since both groups would be optimizing for happiness, the fact that hedonium is similarly optimized for happiness doesn't seem to provide much reason to think that it would outweigh the utilitarian value of more mundane, and far more common, forms of utility-optimization. To be clear, I think it's totally possible that there's something about this argument that I'm missing here. And there are a lot of potential objections I'm skipping over here. But on a basic level, I mostly just lack the intuition that the thing we should care about, from a utilitarian perspective, is the existence of explicit utilitarians in the future, for the aforementioned reasons. The fact that our current world isn't well described by the idea that what matters most is the number of explicit utilitarians, strengthens my point here. (ii) "At present, only a small proportion of humanity holds slightly utilitarian views. However, as unaligned AIs will differ from humans across numerous dimensions, it is plausible that they will possess negligible utilitarian impulses, in stark contrast to humanity's modest (but non-negligible) utilitarian tendencies." My response: Since only a small portion of humanity is explicitly utilitarian, the argument's own logic suggests that there is significant potential for AIs to be even more utilitarian than humans, given the relatively low bar set by humanity's limited utilitarian impulses. While I agree we shouldn't assume AIs will be more utilitarian than humans without specific reasons to believe so, it seems entirely plausible that factors like selection pressures for altruism could lead to this outcome. Indeed, commercial AIs seem to be selected to be nice and helpful to users, which (at least superficially) seems "more utilitarian" than the default (primarily selfish-oriented) impulses of most humans. The fact that humans are only slightly utilitarian should mean that even small forces could cause AIs to exceed human levels of utilitarianism. Moreover, as I've said previously, it's probable that unaligned AIs will possess morally relevant consciousness, at least in part due to the sophistication of their cognitive processes. They are also likely to absorb and reflect human moral concepts as a result of being trained on human-generated data. Crucially, I expect these traits to emerge even if the AIs do not share human preferences.  To see where I'm coming from, consider how humans routinely are "misaligned" with each other, in the sense of not sharing each other's preferences, and yet still share moral concepts and a common culture. For example, an employee can share moral concepts with their employer while having very different consumption preferences from them. This picture is pretty much how I think we should primarily think about unaligned AIs that are trained on human data, and shaped heavily by techniques like RLHF or DPO. Given these considerations, I find it unlikely that unaligned AIs would completely lack any utilitarian impulses whatsoever. However, I do agree that even a small risk of this outcome is worth taking seriously. I'm simply skeptical that such low-probability scenarios should be the primary factor in assessing the value of AI alignment research. Intuitively, I would expect the arguments for prioritizing alignment to be more clear-cut and compelling than "if we fail to align AIs, then there's a small chance that these unaligned AIs might have zero utilitarian value, so we should make sure AIs are aligned instead". If low probability scenarios are the strongest considerations in favor of alignment, that seems to undermine the robustness of the case for prioritizing this work. While it's appropriate to consider even low-probability risks when the stakes are high, I'm doubtful that small probabilities should be the dominant consideration in this context. I think the core reasons for focusing on alignment should probably be more straightforward and less reliant on complicated chains of logic than this type of argument suggests. In particular, as I've said before, I think it's quite reasonable to think that we should align AIs to humans for the sake of humans. In other words, I think it's perfectly reasonable to admit that solving AI alignment might be a great thing to ensure human flourishing in particular. But if you're a utilitarian, and not particularly attached to human preferences per se (i.e., you're non-speciesist), I don't think you should be highly confident that an unaligned AI-driven future would be much worse than an aligned one, from that perspective.
Not sure how to post these two thoughts so I might as well combine them. In an ideal world, SBF should have been sentenced to thousands of years in prison. This is partially due to the enormous harm done to both FTX depositors and EA, but mainly for basic deterrence reasons; a risk-neutral person will not mind 25 years in prison if the ex ante upside was becoming a trillionaire. However, I also think many lessons from SBF's personal statements e.g. his interview on 80k are still as valid as ever. Just off the top of my head: * Startup-to-give as a high EV career path. Entrepreneurship is why we have OP and SFF! Perhaps also the importance of keeping as much equity as possible, although in the process one should not lie to investors or employees more than is standard. * Ambition and working really hard as success multipliers in entrepreneurship. * A career decision algorithm that includes doing a BOTEC and rejecting options that are 10x worse than others. * It is probably okay to work in an industry that is slightly bad for the world if you do lots of good by donating. [1] (But fraud is still bad, of course.) Just because SBF stole billions of dollars does not mean he has fewer virtuous personality traits than the average person. He hits at least as many multipliers than the average reader of this forum. But importantly, maximization is perilous; some particular qualities like integrity and good decision-making are absolutely essential, and if you lack them your impact could be multiplied by minus 20.     [1] The unregulated nature of crypto may have allowed the FTX fraud, but things like the zero-sum zero-NPV nature of many cryptoassets, or its negative climate impacts, seem unrelated. Many industries are about this bad for the world, like HFT or some kinds of social media. I do not think people who criticized FTX on these grounds score many points. However, perhaps it was (weak) evidence towards FTX being willing to do harm in general for a perceived greater good, which is maybe plausible especially if Ben Delo also did market manipulation or otherwise acted immorally. Also note that in the interview, SBF didn't claim his donations offset a negative direct impact; he said the impact was likely positive, which seems dubious.
I am concerned about the H5N1 situation in dairy cows and have written and overview document to which I occasionally add new learnings (new to me or new to world). I also set up a WhatsApp community that anyone is welcome to join for discussion & sharing news. In brief: * I believe there are quite a few (~50-250) humans infected recently, but no sustained human-to-human transmission * I estimate the Infection Fatality Rate substantially lower than the ALERT team (theirs is 63% that CFR >= 10%), something like 80%CI = 0.1 - 5.0 * The government's response is astoundingly bad - I find it insane that raw milk is still being sold, with a high likelihood that some of it contains infectious H5N1 * There are still quite a few genetic barriers to sustained human-to-human transmission * This might be a good time to push specific pandemic preparedness policies

Popular comments

Recent discussion

Cross-posted from LessWrong

In this Rational Animations video, we discuss s-risks (risks from astronomical suffering), which involve an astronomical number of beings suffering terribly. Researchers on this topic argue that s-risks have a significant chance of occurring and...

Continue reading

Executive summary: S-risks, involving astronomical suffering, may be more important to focus on than existential risks; researchers argue s-risks have a significant chance of occurring but can be made less likely through actions today.

Key points:

  1. S-risks have a wider scope and higher severity than existential risks, affecting more beings than would otherwise exist and making their lives worse than non-existence.
  2. S-risks are a possibility due to potential cosmic expansion, advancing technology, suffering occurring through neglect or as a side effect, and the
... (read more)
3
quila
11h
I think it definitely does, if we're in a situation where an S-risk is on the horizon with some sufficient (<- subjective) probability. Also consider https://carado.moe/when-in-doubt-kill-everyone.html (and the author's subsequent updates) ... of course, the whole question is subjective as in moral.
12
BrownHairedEevee
11h
Part of the definition of astronomical suffering is that it's greater than any instances of suffering to date. But factory farming was unprecedented compared to anything before it, so I think the definition of s-risk could be applied retroactively to it.

A while back, I wrote a quicktake about how the Belgian Senate voted to enshrine animal welfare in the Constitution.

It's been a journey. I work for GAIA, a Belgian animal advocacy group that for years has tried to get animal welfare added to the constitution. Today we were

...
Continue reading

Executive summary: The Belgian Constitution now enshrines animal welfare, which will have significant legal and practical implications for improving animal protection in the country.

Key points:

  1. Animal welfare is now recognized as a fundamental value in Belgian society, carrying greater legal weight when in conflict with other constitutional rights.
  2. The inclusion will encourage prioritizing animal protection laws and increased scrutiny of measures that may undermine animal welfare.
  3. In legal cases involving animals, judges will be influenced to give greater con
... (read more)

There are two main areas of catastrophic or existential risk which have recently received significant attention; biorisk, from natural sources, biological accidents, and biological weapons, and artificial intelligence, from detrimental societal impacts of systems, incautious...

Continue reading

Executive summary: Despite frequent comparisons between biorisk and AI risk, the disanalogies between these two areas of catastrophic or existential risk are much more compelling than the analogies.

Key points:

  1. Pathogens have a well-defined attack surface (human bodies), while AI risks have a nearly unlimited attack surface, including computer systems, infrastructure, and social and economic systems.
  2. Mitigation efforts for pandemics are well-funded and established, with international treaties and norms, while AI risk mitigation is poorly understood, underfund
... (read more)
Sign up for the Forum's email digest
You'll get a weekly email with the best posts from the past week. The Forum team selects the posts to feature based on personal preference and Forum popularity, and also adds some announcements and a classic post.
Ulrik Horn commented on My Lament to EA 1h ago
127
7

Edit: so grateful and positively overwhelmed with all the responses!

I am dealing with repetitive strain injury and don’t foresee being able to really respond to many comments extensively (I’m surprised with myself that I wrote all of this without twitching forearms lol!...

Continue reading

I think your observations about a Western feel to most of EA is important. Being born in a Western country myself I can see that everything from the choice of music on podcasts to perhaps more importantly the philosophers and ideologies referenced is very Western-centric. I think there are many other philosophical traditions and historical communities we can draw inspiration from beyond Europe - it is not like EA is the first attempt at doing the most good in the world (I have some familiarity with Tibetan Buddhism and they have fairly strong opinions on e... (read more)

Around the end of Feb 2024 I attended the Summit on Existential Risk and EAG: Bay Area (GCRs), during which I did 25+ one-on-ones about the needs and gaps in the EA-adjacent catastrophic risk landscape, and how they’ve changed.

The meetings were mostly with senior managers...

Continue reading
4
Vasco Grilo
4h
Thanks for the update, Ben. You mean almost no philanthropic funding? According to 80,000 Hours' profile on nuclear war: I estimated the nearterm annual extinction risk per annual spending for AI risk is 59.8 M times that for nuclear risk. However, I have come to prefer expected annual deaths per annual spending as a better proxy for the cost-effectiveness of interventions which aim to save lives (relatedly). From this perspective, it is unclear to me whether AI risk is more pressing than nuclear risk.
6
Benjamin_Todd
3h
I meant from the EA catastrophic risk community, sorry not clarify.

I see. I think it is better to consider spending from other sources because these also contribute towards decreasing risk. In addition, I would not weigh spending by cost-effectiveness (and much less give 0 weight to non-EA spending), as this is what one is trying to figure out when using spending/neglectedness as an heuristic.

Join us to meet other EAG London attendees who

  • come from or live in Germany, Austria or Switzerland, 
  • live close to the DACH borders or 
  • are considering moving there.

After the event, we will take a quick walk of a few minutes over to the EAG together to register.

The local time for the meetup will be: 2-4pm - we might update this as soon as we know the EAG schedule.

--

Join the DACH Telegram group for EAG London: https://t.me/+UTZ4_5JfT583NjMy 

Continue reading

Hi Everyone!

I'm currently a high school student in the United States. I've been casually following and supporting EA for about 1.5 years now, doing what I can with donating any extra money to effective causes. However, I have recently been getting a lot more interested ...

Continue reading

Take time to empathise with people in different worlds to you - you could watch youtube videos of people in poorer nations talking about their situation. GiveDirectly has a load of these I think

There are videos scattered across https://www.givedirectly.org but you get their recipients stories in a more raw form from https://live.givedirectly.org (in many cases click on the summary story for past questions and answers with that recipient).

Outside of GiveDirectly, I found https://www.gapminder.org/dollar-street informative.

Disclosure: I work at GiveDirect... (read more)

According to some highly authoratitive anecdotal accounts, when a lone crab is placed in a bucket it will crawl out of its own accord but put a pile of crabs in a bucket and they will pull each other down in an attempt to escape, dooming them all. This is a classic illustration...

Continue reading

Thanks for your comment.

Bregman doesn't make this assertion in Humankind, but rather makes a well-supported case that systems of control and incentives play to our worst instincts. The reviewer provides no support for his assertions, so I don't really see why anyone would pay the any heed. Bregman provides vast supporting evidence for his claims, and over-turned many of my fundamental assumptions about the nature of humanity—which I had simply gleaned from popular psychology and "common sense", through the weight of the evidence he provides.

I recommend bot... (read more)