New & upvoted

Customize feedCustomize feed
NEW
CommunityCommunity
Personal+

Posts tagged community

Quick takes

Show community
View more
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source "transformer debugger" tool. I resigned from OpenAI on February 15, 2024.
In my latest post I talked about whether unaligned AIs would produce more or less utilitarian value than aligned AIs. To be honest, I'm still quite confused about why many people seem to disagree with the view I expressed, and I'm interested in engaging more to get a better understanding of their perspective. At the least, I thought I'd write a bit more about my thoughts here, and clarify my own views on the matter, in case anyone is interested in trying to understand my perspective. The core thesis that was trying to defend is the following view: My view: It is likely that by default, unaligned AIs—AIs that humans are likely to actually build if we do not completely solve key technical alignment problems—will produce comparable utilitarian value compared to humans, both directly (by being conscious themselves) and indirectly (via their impacts on the world). This is because unaligned AIs will likely both be conscious in a morally relevant sense, and they will likely share human moral concepts, since they will be trained on human data. Some people seem to merely disagree with my view that unaligned AIs are likely to be conscious in a morally relevant sense. And a few others have a semantic disagreement with me in which they define AI alignment in moral terms, rather than the ability to make an AI share the preferences of the AI's operator.  But beyond these two objections, which I feel I understand fairly well, there's also significant disagreement about other questions. Based on my discussions, I've attempted to distill the following counterargument to my thesis, which I fully acknowledge does not capture everyone's views on this subject: Perceived counter-argument: The vast majority of utilitarian value in the future will come from agents with explicitly utilitarian preferences, rather than those who incidentally achieve utilitarian objectives. At present, only a small proportion of humanity holds slightly utilitarian views. However, as unaligned AIs will differ from humans across numerous dimensions, it is plausible that they will possess negligible utilitarian impulses, in stark contrast to humanity's modest (but non-negligible) utilitarian tendencies. As a result, it is plausible that almost all value would be lost, from a utilitarian perspective, if AIs were unaligned with human preferences. Again, I'm not sure if this summary accurately represents what people believe. However, it's what some seem to be saying. I personally think this argument is weak. But I feel I've had trouble making my views very clear on this subject, so I thought I'd try one more time to explain where I'm coming from here. Let me respond to the two main parts of the argument in some amount of detail: (i) "The vast majority of utilitarian value in the future will come from agents with explicitly utilitarian preferences, rather than those who incidentally achieve utilitarian objectives." My response: I am skeptical of the notion that the bulk of future utilitarian value will originate from agents with explicitly utilitarian preferences. This clearly does not reflect our current world, where the primary sources of happiness and suffering are not the result of deliberate utilitarian planning. Moreover, I do not see compelling theoretical grounds to anticipate a major shift in this regard. I think the intuition behind the argument here is something like this: In the future, it will become possible to create "hedonium"—matter that is optimized to generate the maximum amount of utility or well-being. If hedonium can be created, it would likely be vastly more important than anything else in the universe in terms of its capacity to generate positive utilitarian value. The key assumption is that hedonium would primarily be created by agents who have at least some explicit utilitarian goals, even if those goals are fairly weak. Given the astronomical value that hedonium could potentially generate, even a tiny fraction of the universe's resources being dedicated to hedonium production could outweigh all other sources of happiness and suffering. Therefore, if unaligned AIs would be less likely to produce hedonium than aligned AIs (due to not having explicitly utilitarian goals), this would be a major reason to prefer aligned AI, even if unaligned AIs would otherwise generate comparable levels of value to aligned AIs in all other respects. If this is indeed the intuition driving the argument, I think it falls short for a straightforward reason. The creation of matter-optimized-for-happiness is more likely to be driven by the far more common motives of self-interest and concern for one's inner circle (friends, family, tribe, etc.) than by explicit utilitarian goals. If unaligned AIs are conscious, they would presumably have ample motives to optimize for positive states of consciousness, even if not for explicitly utilitarian reasons. In other words, agents optimizing for their own happiness, or the happiness of those they care about, seem likely to be the primary force behind the creation of hedonium-like structures. They may not frame it in utilitarian terms, but they will still be striving to maximize happiness and well-being for themselves and others they care about regardless. And it seems natural to assume that, with advanced technology, they would optimize pretty hard for their own happiness and well-being, just as a utilitarian might optimize hard for happiness when creating hedonium. In contrast to the number of agents optimizing for their own happiness, the number of agents explicitly motivated by utilitarian concerns is likely to be much smaller. Yet both forms of happiness will presumably be heavily optimized. So even if explicit utilitarians are more likely to pursue hedonium per se, their impact would likely be dwarfed by the efforts of the much larger group of agents driven by more personal motives for happiness-optimization. Since both groups would be optimizing for happiness, the fact that hedonium is similarly optimized for happiness doesn't seem to provide much reason to think that it would outweigh the utilitarian value of more mundane, and far more common, forms of utility-optimization. To be clear, I think it's totally possible that there's something about this argument that I'm missing here. And there are a lot of potential objections I'm skipping over here. But on a basic level, I mostly just lack the intuition that the thing we should care about, from a utilitarian perspective, is the existence of explicit utilitarians in the future, for the aforementioned reasons. The fact that our current world isn't well described by the idea that what matters most is the number of explicit utilitarians, strengthens my point here. (ii) "At present, only a small proportion of humanity holds slightly utilitarian views. However, as unaligned AIs will differ from humans across numerous dimensions, it is plausible that they will possess negligible utilitarian impulses, in stark contrast to humanity's modest (but non-negligible) utilitarian tendencies." My response: Since only a small portion of humanity is explicitly utilitarian, the argument's own logic suggests that there is significant potential for AIs to be even more utilitarian than humans, given the relatively low bar set by humanity's limited utilitarian impulses. While I agree we shouldn't assume AIs will be more utilitarian than humans without specific reasons to believe so, it seems entirely plausible that factors like selection pressures for altruism could lead to this outcome. Indeed, commercial AIs seem to be selected to be nice and helpful to users, which (at least superficially) seems "more utilitarian" than the default (primarily selfish-oriented) impulses of most humans. The fact that humans are only slightly utilitarian should mean that even small forces could cause AIs to exceed human levels of utilitarianism. Moreover, as I've said previously, it's probable that unaligned AIs will possess morally relevant consciousness, at least in part due to the sophistication of their cognitive processes. They are also likely to absorb and reflect human moral concepts as a result of being trained on human-generated data. Crucially, I expect these traits to emerge even if the AIs do not share human preferences.  To see where I'm coming from, consider how humans routinely are "misaligned" with each other, in the sense of not sharing each other's preferences, and yet still share moral concepts and a common culture. For example, an employee can share moral concepts with their employer while having very different consumption preferences from them. This picture is pretty much how I think we should primarily think about unaligned AIs that are trained on human data, and shaped heavily by techniques like RLHF or DPO. Given these considerations, I find it unlikely that unaligned AIs would completely lack any utilitarian impulses whatsoever. However, I do agree that even a small risk of this outcome is worth taking seriously. I'm simply skeptical that such low-probability scenarios should be the primary factor in assessing the value of AI alignment research. Intuitively, I would expect the arguments for prioritizing alignment to be more clear-cut and compelling than "if we fail to align AIs, then there's a small chance that these unaligned AIs might have zero utilitarian value, so we should make sure AIs are aligned instead". If low probability scenarios are the strongest considerations in favor of alignment, that seems to undermine the robustness of the case for prioritizing this work. While it's appropriate to consider even low-probability risks when the stakes are high, I'm doubtful that small probabilities should be the dominant consideration in this context. I think the core reasons for focusing on alignment should probably be more straightforward and less reliant on complicated chains of logic than this type of argument suggests. In particular, as I've said before, I think it's quite reasonable to think that we should align AIs to humans for the sake of humans. In other words, I think it's perfectly reasonable to admit that solving AI alignment might be a great thing to ensure human flourishing in particular. But if you're a utilitarian, and not particularly attached to human preferences per se (i.e., you're non-speciesist), I don't think you should be highly confident that an unaligned AI-driven future would be much worse than an aligned one, from that perspective.
I've recently made an update to our Announcement on the future of Wytham Abbey, saying that since this announcement, we have decided that we will use some of the proceeds on Effective Venture's general costs.
Edgy and data-driven TED talk on how the older generations in America are undermining the youth. Worth a watch.  
Mobius (the Bay Area-based family foundation where I work) is exploring new ways to remove animals from the food system. We're looking for a part-time Program Manager to help get more talented people who are knowledgable about farmed animal welfare and/or alternative proteins into US government roles. This entrepreneurial generalist would pilot a 3-6 month program to support promising students and early graduates with applying to and securing entry-level Congressional roles. We think success here could significantly improve thoughtful policymaking on farmed animal welfare and/or alternative proteins.  You can see more about the role here. Key details on the role: * Application deadline:  Tuesday 28th of May, at 23:59pm PT. Apply here. * Contract: 15-20 hours per week for 3-6 months, with the possibility of extending. * Location: Remote in the US. * Salary: $29-38 per hour (equivalent to approx. $60,000-$80,000/year) depending on experience. For exceptional candidates, we’re happy to discuss higher compensation. This would be a contractor role, with no additional benefits. Please share with potentially interested people!

Popular comments

Recent discussion

I am seeking suggestions for a path to pursue my driving mission - reducing as much suffering as possible for all sentient beings.

At 18, I had to drop out of college before completing one year due to the onset of severe depression, anxiety, and ME/CFS. This decade-long ...

Continue reading

I actually applied for some career advice from them a while ago, and they didn't accept my request. For more details on capabilities, check out the reply I just made to John's comment!

3
John Salter
8h
More information needed: * What is your knowledge level? * Are there specific areas within the non-profit sector you are particularly interested in? * How many hours per week are you able to commit to volunteering or part-time work? * Do you have any previous work or volunteer experience, even if not directly related to non-profits? * What are your strengths? * What are your salary expectations?
1
ampersandman
12m
Thanks for the questions! I've only completed high school. I completed two out of three quarters of the first year at college before I had to drop out, and have taken two more classes since then. All just basic gen ed classes like English and history. Some bio and chem. The two organizations I mentioned above, Center for Reducing Suffering, and Organization for the Prevention of Intense Suffering, have the kind of mission I am interested in. Any kind of organization that researches the most effective ways to prevent the worst kinds of suffering, no matter what form or who is experiencing it.  Maybe specifically things like research into preventing s-risks, treating extremely painful disorders, wild animal suffering, or finding ways to prevent human-caused cruelty/torture. Also, I'm interested in the field of empathy, because I think if we can find a way to increase the general empathy of the population, and cause more people to care about the suffering of others, this might be one of the greatest levers for preventing suffering in all its forms. An organization that studies the science of compassion and altruism is Stanford's The Center for Compassion And Altruism Research And Education. All combined, my energy would probably allow for about 20-40 hours. With my CFS still not completely gone, the less I have to be on my feet, the longer I can work. Most of my work experience is very much non-related: delivery driver, furniture assembly, mover, rideshare. During my short stint at college, I did have a job helping out at a microbiology lab for a few months - getting petri dishes ready, autoclaving, etc. I volunteered just a couple times at a homeless shelter passing out food. I'm fairly intelligent. At least with the more "objective" subjects like math and science. Those always came really easy to me up through high school, and I once placed first in a regional math tournament in middle school.  "Subjective" stuff like literature, though - please don't ask me

Around the end of Feb 2024 I attended the Summit on Existential Risk and EAG: Bay Area (GCRs), during which I did 25+ one-on-ones about the needs and gaps in the EA-adjacent catastrophic risk landscape, and how they’ve changed.

The meetings were mostly with senior managers...

Continue reading

I am extremely confused (theoretically) how we can simultaneously have:

1. An Artificial Superintelligence

2. It be controlled by humans (therefore creating misuse of concentration of power issues)

The argument doesn't get off the ground for me

9
yanni kyriacos
21m
Another guess: people who were competent in individual contributor roles got promoted into people management roles because of issues I mentioned here:
1
yanni kyriacos
23m
Hi Benjamin! Thanks for your post. Regarding this comment > "Right now, directly talking about AI safety seems to get more people in the door than talking about EA, so some community building efforts have switched to that." What do you mean by "in the door"?

We’re excited to share a new addition to our site: an impact-focused job board! 

We’ve considered launching a job board for some time, so we’re happy to add this feature to the Probably Good site. The job board aims to:

  • Help people find more promising job opportunities
...
Continue reading

Filtering by salary excludes jobs that don't have posted salaries

Would love an option to toggle that

Sign up for the Forum's email digest
You'll get a weekly email with the best posts from the past week. The Forum team selects the posts to feature based on personal preference and Forum popularity, and also adds some announcements and a classic post.

Are you exploring the intersection of AI, animal advocacy and animal welfare, and do you have an innovative concept that could redefine the food industry? We want to hear from you!

As most EAs know, AI is poised to disrupt society in countless ways, but its specific impacts on the food system remain uncertain and potentially very harmful. Advocates are beginning to encourage animal-inclusive AI, establish benchmarks to measure speciesist bias, and even build new LLMs, but there is much more work to be done. 

In a joint effort with Vegan Hacktivists and Violet Studios, ProVeg's Kickstarting for Good Incubator aims to harness AI to advance our food systems. We are searching for the most effective strategies and talented individuals in this sphere.

If you are ready to contribute ideas that could significantly benefit the global food ecosystem, please consider applying to the 2024 Kickstarting...

Continue reading

We can use Nvidia's stock price to estimate plausible market expectations for the size of the AI chip market, and we can use that to back-out expectations about AI software revenues and value creation.

Doing this helps to understand how much AI growth is expected by society...

Continue reading
4
MichaelDickens
4h
FWIW this might not be true of the average reader but I felt like I understood all the implicit assumptions Ben was making and I think it's fine that he didn't add more caveats/hedging. His argument improved my model of the world.
8
JoshuaBlake
8h
Your objections seem reasonable but I do not understand their implications due to a lack of finance background. Would you mind helping me understand how your points affect the takeaway? Specifically, do you think that the estimates presented here are biased, much more uncertain than the post implies, or something else?

Sure, the claim hides a lot of uncertainties. At a high level the article says “A implies X, Y and Z”, but you can’t possibly derive all of that information from the single number A. Really what’s the article should say is “X, Y and Z are consistent with the value of A”, which is a very different claim.

i don’t specifically disagree with X, Y and Z.

Summary

The net welfare effects of fishing, changes to fishing pressure and demand for wild-caught aquatic animals on wild aquatic animals seem highly morally ambiguous, in large part because there are

  1. tradeoffs between species due to predation, e.g. larger (respectively smaller) populations and life expectancies for one species results in smaller (respectively larger) populations and life expectancies for their prey and competitors, and this cascades down the food chain,
  2. uncertainty about moral weight tradeoffs between affected species, and
  3. depending on the moral view, uncertainty about whether the directly and indirectly affected animals have good or bad lives on average.

 

Illustration of a marine food web by ChatGPT/DALL·E

Acknowledgements

Thanks to Brian Tomasik, Ren Ryba and Tori for their feedback on an earlier draft. All errors are my own.

For prior related writing that is more comprehensive...

Continue reading

Summary

...
Continue reading
Vasco Grilo
6h
Thanks for clarifying, Johannes. Strongly upvoted. I think estimates of the chicken-years affected per $ spent in corporate campaigns for chicken welfare may be more resilient than ones of the cost-effectiveness of CCF in t/$. According to The Humane League: As a side note: * Saulius estimated campaigns for broiler welfare are 27.8 % (= 15/54) as cost-effective as the cage-free campaigns concerning the above. * OP thinks “the marginal FAW [farmed animal welfare] funding opportunity is ~1/5th as cost-effective as the average from Saulius’ analysis”. * However, I accounted for both of these effects in my analysis.
jackva
5h
Thanks, this updates me, I had cached something more skeptical on chicken welfare campaigns. Do you have a sense of what "advocacy multiplier" this implies? Is this >1000x of helping animals directly? I have the suspicion that the relative results between causes are -- to a significant degree -- not driven by cause-differences but by comfort with risk and the kind of multipliers that are expected to be feasible. FWIW, I also do believe that marginal donations to help farmed animals will do more good than marginal climate donations.

Thanks for the follow-up! It prompted me to think about relevant topics.

Do you have a sense of what "advocacy multiplier" this implies? Is this >1000x of helping animals directly?

By helping animals directly, are you talking about rescuing animals from factory-farms, and then supporting them in animal sanctuaries? I am not aware of cost-effectiveness analyses of these, but here is a quick estimate. I speculate it would take 2 h to save one broiler. In this case, for 20 $/h, the cost to save a broiler would be 40 $ (= 2*20). Broilers in a conventional sce... (read more)

James Özden posted a Quick Take 3h ago

Mobius (the Bay Area-based family foundation where I work) is exploring new ways to remove animals from the food system. We're looking for a part-time Program Manager to help get more talented people who are knowledgable about farmed animal welfare and/or alternative proteins into US government roles. This entrepreneurial generalist would pilot a 3-6 month program to support promising students and early graduates with applying to and securing entry-level Congressional roles. We think success here could significantly improve thoughtful policymaking on farmed animal welfare and/or alternative proteins.  You can see more about the role here.

Key details on the role:

  • Application deadline:  Tuesday 28th of May, at 23:59pm PT. Apply here.
  • Contract: 15-20 hours per week for 3-6 months, with the possibility of extending.
  • Location: Remote in the US.
  • Salary: $29-38 per hour (equivalent to approx. $60,000-$80,000/year) depending on experience. For exceptional candidates, we’re happy to discuss higher compensation. This would be a contractor role, with no additional benefits.

Please share with potentially interested people!

Continue reading

You can now integrate Our World in Data charts much more easily into forum posts. These new features work with any of our interactive charts – their URLs look like this:

https://ourworldindata.org/grapher/[name-of-the-chart]

Previews on hover

If you add a link to an OWID chart...

Continue reading

I'll write down the request. I tried previewing what it would look like and it's not perfect

so we'd probably need OWID to make a dedicated preview page.