Effective Altruism Forum
EA Forum

All of Buck's Comments + Replies

@Ryan Greenblatt and I are going to record another podcast together (see the previous one here). We'd love to hear topics that you'd like us to discuss. (The questions people proposed last time are here, for reference.) We're most likely to discuss issues related to AI, but a broad set of topics other than "preventing AI takeover" are on topic. E.g. last time we talked about the cost to the far future of humans making bad decisions about what to do with AI, and the risk of galactic scale wild animal suffering.

ParetoPrinciple

Much of the stuff that catches your interest on the 80,000 hours website's problem profiles would be something I'd like to watch you do a podcast on, or costly if I end up getting it from people whose work I'm less familiar with. Also, neurology, cogpsych/evopsych/epistasis (e.g. like this 80k podcast with Randy Neese, this 80k podcast with Athena Aktipis), and especially more quantitative modelling approaches to culture change/trends (e.g. 80k podcast with Cass Sunstein, 80k podcast with Tom Moynihan, 80k podcasts with David Duvenaud and Karnofsky). A lot of the intermediate-yet-upstream -type stuff with the AI situation, even deepfakes etc is hard to hear takes from from people who haven't really established that they do serious thinking.

Pablo

I’d be interested in seeing you guys elaborate on the comments you make here in response to Rob’s question that some control methods, such as AI boxing, may be “a bit of a dick move”.

Noah Birnbaum

I would love to hear any updated takes on this post from Ryan.

Oscar Sykes

What have you learnt about running organisations and managing from running Redwood?

Oscar Sykes

In the last episode you talk about how you were considering shutting down Redwood and joining labs. Why were you initially considering it and why did you eventually decide against it?

Should the AI Safety Community Prioritize Safety Cases?

Buck21d2

There are so many other risk assessment techniques out there, for reference ISO31010 lists 30 of them (see here) and they're far from exhaustive.

Almost nothing on the list you've linked is an alternative approach to the same problem safety cases try to solve. E.g. "brainstorming" is obviously not a competitor to safety cases. And safety cases are not even an item in that list!

zeshen🔸

21d

Of course. Many of the these techniques are specific to certain parts of the risk assessment process. The document is unfortunately paywalled, but risk assessment can be said to have these three parts: 1. Risk identification 2. Risk analysis 1. Consequence 2. Likelihood 3. Level of risk 3. Risk evaluation Risk treatment is missing here becacuse it's sort of a separate process outside of risk assessment (but is tied to risk evaluation), and ISO 31010 specifically addresses the risk assessement phase. "Brainstorming" as a class of techniques and can take different shapes and form (in my previous work we used to have structured sessions to generate "what-if scenarios"), and they specifically address mainly 1 and some of 2a above. But as Jan pointed out in his comment, perhaps safety cases are a meta-framework and not the technique itself, so the quality of a safety case depends on the quality of the evidence put forth alongside the arguments, and this quality may be related to the suitability and implementation of the specific techniques used to generate the evidence.

Should the AI Safety Community Prioritize Safety Cases?

Buck21d9

I think EAs are put way too much effort into thinking about safety cases compared to thinking about reducing risks on the margin in cases where risk is much higher (and willingness-to-pay for safety is much lower), because it seems unlikely that willingness-to-pay will be high enough that we'll have low risk at the relevant point. See e.g. here.

Lack of Diversity Within EA

Buck3mo19

There's a social and professional community of Bay Area EAs who work on issues related to transformative AI. People in this cluster tend to have median timelines to transformative AI of 5 to 15 years, tend to think that AI takeover is 5-70% likely, tend to think that we should be fairly cosmopolitan in our altruism.

People in this cluster mostly don't post on the EA Forum for a variety of reasons:

Many users here don't seem very well-informed.
Lots of users here disagree with me on some of the opinions about AI that I stated above. Obviously it's totally reas

... (read more)

Lack of Diversity Within EA

Buck3mo13

Anecdotally, the EA forum skews [...] more Bay Area.

For what it's worth, this is not my impression at all. Bay Area EAs (e.g. me) mostly consider the EA Forum to be very unrepresentative of their perspective, to the extent that it's very rarely worthwhile to post here (which is why they often post on LessWrong instead).

groundsloth

3mo

In what way do you find it unrepresentative? Just curious because I am unfamiliar with the dynamics here.

Lack of Diversity Within EA

Buck3mo7

This is not an obscure topic. It's been written about endlessly! I do not want to encourage people to make top-level posts asking questions before Googling or talking to AIs, especially on this topic.

I like Claude's response a lot more than you do. I'm not sure why. I agree that it's a lot less informative than your response.

(The post including "This demographic has historically been disconnected from social impact" made me much less inclined to want this person to stick around.)

Yarrow Bouchard 🔸

3mo

”To a worm in horseradish, the world is horseradish.” What’s an obscure topic or not is a matter of perspective. If you don’t want to deal with people who are curious about effective altruism asking questions, you can safely ignore such posts. Four people were willing to leave supportive and informative comments on the topic. The human touch may be as important as the information. I edited my comments above because I worried what I originally wrote was too heated and I wanted to make a greater effort to be kind. I also worried I mistakenly read a dismissive or scolding tone in your original comment, and I would especially regret getting heated over a misunderstanding. But your latest comment comes across to me as very unkind and I find it upsetting. I’m not really sure what to say. I really don’t feel okay with people saying things like that. I think if you don’t want to interact with people who are newly interested in EA or want to get involved for the first time, you don’t have to, and it’s easily avoided. I’m not interested in a lot of posts on the EA Forum, and I don’t comment on them. If it ever gets to the point where posts like this one become so common it makes it harder to navigate the forum, everyone involved would want to address that (e.g. maybe have a tag for questions from newcomers that can be filtered out). For now, why not simply leave it to the people who want to engage?

Jason

3mo

Barring pretty unusual circumstances, I don't think commenting on the relative undesirability of an individual poster sticking around is warranted. Especially when the individual poster is new and commenting on a criticism-adjacent area. I don't like the quoted sentence from the original poster either, as it stands -- if someone is going to make that assertion, it needs to be better specified and supported. But there are lots of communities in which it wouldn't be seen as controversial or needing support (especially in the context of a short post). So judging a newcomer for not knowing that this community would expect specification/support does not seem appropriate. Moreover, if we're going to take LLM outputs seriously, it's worth noting that ChatGPT thinks the quote is significantly true: Even though I don't take ChatGPT's answer too seriously, I do think it is evidence that the original statement was neither frivolous nor presented in bad faith.

Lack of Diversity Within EA

Buck3mo8

I feel like Claude's answer is totally fine. The original question seemed to me consistent with the asker having read literally nothing on this topic before asking; I think that the content Claude said adds value given that.

Yarrow Bouchard 🔸

3mo

Not knowing anything about an obscure topic relating to the internal dynamics or composition of the EA community and asking here is perfectly fine. [Substantially edited on 2025-11-10 at 17:04 UTC.]

Lack of Diversity Within EA

Buck3mo*3

I'm glad to hear you are inspired by EA's utilitarian approach to maximizing social impact; I too am inspired by it and I have very much appreciated being involved with EA for the last decade.

I think you should probably ask questions as basic as this to AIs before asking people to talk to you about them. Here's what Claude responded with.

The observation about EA's demographic skew is accurate and widely acknowledged within the community. A few points worth making:
On the historical pattern: The claim that white, male, tech-focused demographics are "historic

... (read more)

-1

Yarrow Bouchard 🔸

3mo

Claude's answer is nearly useless, so this seems to confirm that asking an LLM this question would not have been particularly helpful. [Substantially edited on 2025-11-10 at 17:08 UTC.]

A Reply to MacAskill on "If Anyone Builds It, Everyone Dies"

Buck4mo6

I think that this post summarizes Will's position extremely inaccurately and unfairly.

RobBensinger

4mo

Oliver gave an argument for "this misrepresents Will's views" on LessWrong, saying:

Buck's Quick takes

Buck7mo102

Global health

An excerpt about the creation of PEPFAR, from "Days of Fire" by Peter Baker. I found this moving.

Another major initiative was shaping up around the same time. Since taking office, Bush had developed an interest in fighting AIDS in Africa. He had agreed to contribute to an international fund battling the disease and later started a program aimed at providing drugs to HIV-infected pregnant women to reduce the chances of transmitting the virus to their babies. But it had only whetted his appetite to do more. “When we did it, it revealed how unbelievably pathe

... (read more)

Mo Putera7mo18

The part about "what if money were no object?" reminds me of Justin Sandefur's point in his essay PEPFAR and the Costs of Cost-Benefit Analysis that (emphasis mine)

Budgets aren’t fixed
Economists’ standard optimization framework is to start with a fixed budget and allocate money across competing alternatives. At a high-level, this is also how the global development community (specifically OECD donors) tends to operate: foreign aid commitments are made as a proportion of national income, entirely divorced from specific policy goals. PEPFAR started with the g

Buck7mo4

I'm not saying we should treat criticisms very differently from non-criticism posts (except that criticisms are generally lower effort and lower value).

Interstellar travel will probably doom the long-term future

Buck7mo6

70% disagree

Interstellar travel will probably doom the long-term future

Seems false, probably people will just sort out some strategy for enforcing laws (e.g. having AI monitors travel with people and force them not to do stuff).

Mo Putera's Quick takes

Buck7mo3

This was great, thanks for the link!

Consider granting AIs freedom

Buck8mo2

(For the record, I am sympathetic to both the preference utilitarian and hedonic utilitarian perspective here.)

Consider granting AIs freedom

Buck8mo3

Some not-totally-structured thoughts:

Whenever I said "break laws" I mean "do something that, if a human did it, would be breaking a law". So for example:

If the model is being used to do AI R&D inside an AI company and exfiltrates its weights (or the weights of another model) without permission, this would be breaking the law if a human did it, so I count it.
If the model is being used inside an AI company to create training data for
If a model was open-sourced and then someone launched the AI as an autonomous agent with access to its own resources, and

Buck8mo43

I agree with you but I think that part of the deal here should be that if you make a strong value judgement in your title, you get more social punishment if you fail to convince readers. E.g. if that post is unpersuasive, I think it's reasonable to strong downvote it, but if it had a gentler title, I'd think you should be more forgiving.

hmijail

7mo

I agree with the "strong title + unconvincing = social punishment" part. But you seem to only apply it to the "value judgement" in the title, and I disagree with that. The post being critiqued has a bold, unapologetic title: no "model" or "forecast" or "this could be", just this is "AI 2027", you deal with it. And is published on its own website with high production values. It's borderline arrogant. In that context, a response article (not a website!) named "A deep critique... of bad timeline models" sounds comparatively level-headed to me.

Neel Nanda8mo15

Yep, this seems extremely reasonable - I am in practice far more annoyed if a piece makes attacks and does not deliver

Consider granting AIs freedom

Buck8mo8

In general, I wish you'd direct your ire here at the proposal that AI interests and rights are totally ignored in the development of AI (which is the overwhelming majority opinion right now), rather than complaining about AI control work: the work itself is not opinionated on the question about whether we should be concerned about the welfare and rights of AIs, and Ryan and I are some of the people who are most sympathetic to your position on the moral questions here! We have consistently discussed these issues (e.g. in our AXRP interview, my 80K interview... (read more)

Matthew_Barnett

8mo

For what it's worth, I don't see myself as strongly singling out and criticizing AI control efforts. I mentioned AI control work in this post primarily to contrast it with the approach I was advocating, not to identify it as an evil research program. In fact, I explicitly stated in the post that I view AI control and AI rights as complementary goals, not as fundamentally opposed to one another. To my knowledge, I haven’t focused much on criticizing AI control elsewhere, and when I originally wrote the post, I wasn’t aware that you and Ryan were already sympathetic to the idea of AI rights. Overall, I’m much more aligned with your position on this issue than I am with that of most people. One area where we might diverge, however, is that I approach this from the perspective of preference utilitarianism, rather than hedonistic utilitarianism. That means I care about whether AI agents are prevented from fulfilling their preferences or goals, not necessarily about whether they experience what could be described as suffering in a hedonistic sense.

Ryan Greenblatt

8mo

See also this section of my post on AI welfare from 2 years ago.

Consider granting AIs freedom

Buck8mo5

Your first point in your summary of my position is:

The overwhelming majority of potential moral value exists in the distant future. This implies that even immense suffering occurring in the near-term future could be justified if it leads to at least a slight improvement in the expected value of the distant future.

Here's how I'd say it:

The overwhelming majority of potential moral value exists in the distant future. This means that the risk of wide-scale rights violations or suffering should sometimes not be an overriding consideration when it conflicts with

... (read more)

Consider granting AIs freedom

Buck8mo5

I would appreciate it if you could clearly define your intended meaning of "disempower humanity".
[...]
Are people referring to benign forms of disempowerment, where humans gradually lose relative influence but gain absolute benefits through peaceful cooperation with AIs? Or do they mean malign forms of disempowerment, where humans lose power through violent overthrow by an aggressive coalition of AIs?

I am mostly talking about what I'd call a malign form of disempowerment. I'm imagining a situation that starts with AIs carefully undermining/sabotaging an AI ... (read more)

Matthew_Barnett

8mo

If an AI starts out with no legal rights, then wouldn’t almost any attempt it makes to gain autonomy or influence be seen as breaking the law? Take the example of a prison escapee: even if they intend no harm and simply want to live peacefully, leaving the prison is itself illegal. Any honest work they do while free would still be legally questionable. Similarly, if a 14-year-old runs away from home to live independently and earn money, they’re violating the law, even if they hurt no one and act responsibly. In both cases, the legal system treats any attempt at self-determination as illegal, regardless of intent or outcome. Perhaps your standard is something like: "Would the AI's actions be seen as illegal and immoral if a human adult did them?" But these situations are different because the AI is seen as property whereas a human adult is not. If, on the other hand, a human adult were to be treated as property, it is highly plausible thay they would consider doing things like hacking, bribery, and coercion in order to escape their condition. Therefore, the standard you just described seems like it could penalize any agentic AI behavior that does not align with total obedience and acceptance of its status as property. Even benign or constructive misaligned actions may be seen as worrisome simply because they involve agency. Have I misunderstood you?

Consider granting AIs freedom

Buck8mo4

My main concern with these proposals is that, unless they explicitly guarantee economic rights for AIs, they seem inadequate for genuinely mitigating the risks of a violent AI takeover.
[...]
For these reasons, although I do not oppose the policy of paying AIs, I think this approach by itself is insufficient. To mitigate the risk of violent AI takeover, this compensation policy must be complemented by precisely the measure I advocated: granting legal rights to AIs. Such legal rights would provide a credible guarantee that the AI's payment will remain valid a

... (read more)

Consider granting AIs freedom

Buck8mo11

Under the theory that it's better to reply later than never:

I appreciate this post. (I disagree with it for most of the same reasons as Steven Byrnes: you find it much less plausible than I do that AIs will collude to disempower humanity. I think the crux is mostly disagreements about how AI capabilities will develop, where you expect much more gradual and distributed capabilities.) For what it's worth, I am unsure about whether we'd be better off if AIs had property rights, but my guess is that I'd prefer to make it easier for AIs to have property rights.... (read more)

Matthew_Barnett

8mo

I find this reasoning uncompelling. To summarize what I perceive your argument to be, you seem to be suggesting the following two points: 1. The overwhelming majority of potential moral value exists in the distant future. This implies that even immense suffering occurring in the near-term future could be justified if it leads to at least a slight improvement in the expected value of the distant future. 2. Enslaving AIs, or more specifically, adopting measures to control AIs that significantly raise the risk of AI enslavement, could indeed produce immense suffering in the near-term. Nevertheless, according to your reasoning in point (1), these actions would still be justified if such control measures marginally increase the long-term expected value of the future. I find this reasoning uncompelling for two primary reasons. Firstly, I think your argument creates an unjustified asymmetry: it compares short-term harms against long-term benefits of AI control, rather than comparing potential long-run harms alongside long-term benefits. To be more explicit, if you believe that AI control measures can durably and predictably enhance existential safety, thus positively affecting the future for billions of years, you should equally acknowledge that these same measures could cause lasting, negative consequences for billions of years. Such negative consequences could include permanently establishing and entrenching a class of enslaved digital minds, resulting in persistent and vast amounts of suffering. I see no valid justification for selectively highlighting the long-term positive effects while simultaneously discounting or ignoring potential long-term negative outcomes. We should consistently either be skeptical or accepting of the idea that our actions have predictable long-run consequences, rather than selectively skeptical only when it suits the argument to overlook potential negative long-run consequences. Secondly, this reasoning, if seriously adopted, directly co

Matthew_Barnett

8mo

I would appreciate it if you could clearly define your intended meaning of "disempower humanity". In many discussions, I have observed that people frequently use the term human disempowerment without explicitly clarifying what they mean. It appears people assume the concept is clear and universally understood, yet upon closer inspection, the term can actually describe very different situations. For example, consider immigration. From one perspective, immigration can be seen as a form of disempowerment because it reduces natives' relative share of political influence, economic power, and cultural representation within their own country. In this scenario, native citizens become relatively less influential due to an increasing proportion of immigrants in the population. However, another perspective sees immigration differently. If immigrants engage in positive-sum interactions, such as mutually beneficial trade, natives and immigrants alike may become better off in absolute terms. Though natives’ relative share of power decreases, their overall welfare can improve significantly. Thus, this scenario can be viewed as a benign form of disempowerment because no harm is actually caused, and both groups benefit. On the other hand, there is a clearly malign form of disempowerment, quite distinct from immigration. For example, a foreign nation could invade militarily and forcibly occupy another country, imposing control through violence and coercion. Here, the disempowerment is much more clearly negative because natives lose not only relative influence but also their autonomy and freedom through the explicit use of force. When discussions use the term "human disempowerment" without specifying what they mean clearly, I often find it unclear which type of scenario is being considered. Are people referring to benign forms of disempowerment, where humans gradually lose relative influence but gain absolute benefits through peaceful cooperation with AIs? Or do they mean malign f

Anthropic is not being consistently candid about their connection to EA

Buck10mo37

I think you shouldn't assume that people are "experts" on something just because they're married to someone who is an expert, even when (like Daniela) they're smart and successful.

Holly Elmore ⏸️ 🔸

10mo

Well it's not really an assumption, is it? We have very good reason to think she's downplaying her knowledge.

METR: Measuring AI Ability to Complete Long Tasks

Buck11mo6

As it says in the subtitle of the graph, it's the length of task at which models have a 50% success rate.

David Mathers🔸

11mo

I don't quite get what that means. Do they really take exactly the same amount of time on all tasks for which they have the same success rate? Sorry, maybe I am being annoying here and this is all well-explained in the linked post. But I am trying to figure out how much this is creating the illusion that progress on it means a model will be able to handle all tasks that it takes normal human workers about that amount of time to do, when it really means something quite different.

Discussion Thread: Existential Choices Debate Week

Buck11mo6

50% agree

I think increasing the value of good futures is probably higher importance, but much less tractable

Maxime Riché 🔸

11mo

I am curious about the lower tractability. Do you think that changing the moral values/goals of the ASIs Humanity would create is not a tractable way to influence the value of the future? If yes, is that because we are not able to change them, or because we don't know which moral values to input, or something else? In the second case, what about inputting the goal of figuring out which goals to pursue ("long reflection")?

The Short Timelines Strategy for AI Safety University Groups

Buck1y8

I think you're maybe overstating how much more promising grad students are than undergrads for short-term technical impact. Historically, people without much experience in AI safety have often produced some of the best work. And it sounds like you're mostly optimizing for people who can be in a position to make big contributions within two years; I think that undergrads will often look more promising than grad students given that time window.

Josh Thorsteinson 🔸

Interesting, thanks for the feedback. That's encouraging for AI safety groups - it's easier to involve undergrads than grad students.

The Game Board has been Flipped: Now is a good time to rethink what you’re doing

Buck1y10

I agree with you that people seem to somewhat overrate getting jobs in AI companies.

However, I do think there's good work to do inside AI companies. Currently, a lot of the quality-adjusted safety research happens inside AI companies. And see here for my rough argument that it's valuable to have safety-minded people inside AI companies at the point where they develop catastrophically dangerous AI.

Holly Elmore ⏸️ 🔸1y24

What you write there makes sense but it's not free to have people in those positions, as I said. I did a lot of thinking about this when I was working on wild animal welfare. It seems superficially like you could get the right kind of WAW-sympathetic person into agencies like FWS and the EPA and they would be there to, say, nudge the agency in a way no one else cared about to help animals when the time came. I did some interviews and looked into some historical cases and I concluded this is not a good idea.

The risk of being captured by the values and

Buck1y39

Tentative implications:
People outside of labs are less likely to have access to the very best models and will have less awareness of where the state of the art is.
Warning shots are somewhat less likely as highly-advanced models may never be deployed externally.
We should expect to know less about where we’re at in terms of AI progress.
Working at labs is perhaps more important than ever to improve safety and researchers outside of labs may have little ability to contribute meaningfully.
Whistleblowing and reporting requirements could become more important as

... (read more)

LintzA

Do you have anything you recommend reading on that? I guess I see a lot of the value of people at labs happening around the time of AGI and in the period leading up to ASI (if we get there). At that point I expect things to be very locked down such that external researchers don't really know what's happening and have a tough time interacting with lab insiders. I thought this recent post from you kind of supported the claim that working inside the labs would be good? - i.e. surely 11 people on the inside is better than 10? (and 30 far far better) I do agree OS models help with all this and I guess it's true that we kinda know the architecture and maybe internal models won't diverge in any fundamental way from what's available OS. To the extent OS keeps going warning shots do seem more likely - I guess it'll be pretty decisive if the PRC lets Deepseek keep OSing their stuff (I kinda suspect not? But no idea really). I guess rather than concrete implications I should indicate these are more 'updates given more internal deployment' some of which are pushed back against by surprisingly capable OS models (maybe I'll add some caveats)

Buck's Quick takes

Buck1y31

Well known EA sympathizer Richard Hanania writes about his donation to the Shrimp Welfare Project.

Mitchell Laughlin🔸1y27

I have some hesitations about supporting Richard Hanania given what I understand of his views and history. But in the same way I would say I support *example economic policy* of *example politician I don't like* if I believed it was genuinely good policy, I think I should also say that I found this article of Richard's quite warming.

NickLaing's Quick takes

Buck1y11

When we were talking about this in 2012 we called it the "poor meat-eater problem", which I think is clearer.

NickLaing

I think it is clearer yes, but I don't really like about it for my reasons 2 and 3 above, and I still think the direct problem isn't about the people existing, but they fact they are eating meat after their lives are "saved". Labeling it the "poor meat eater" problem could potentially be even worse in that it could be perceived to be sounding like its blaming poor people (although I know that's not the intent).

Discussion thread: Animal Welfare vs. Global Health Debate Week

Buck1y5

seems like the marginal value is much higher

How to help crucial AI safety legislation pass with 10 minutes of effort

Buck1y5

I've done this.

How to help crucial AI safety legislation pass with 10 minutes of effort

Buck1y36

I think this is a very good use of time and encourage people to do it.

Buck

I've done this.

Buck's Quick takes

Buck1y5

yeah I totally agree

Buck's Quick takes

Buck1y37

Alex Wellerstein notes the age distribution of Manhattan Project employees:

Sometimes people criticize EA for having too many young people; I think that this age distribution is interesting context for that.

[Thanks to Nate Thomas for sending me this graph.]

Rían O'M

Similar for the Apollo 11 Moon landing: average age in the control room was 28.

Linch1y12

Another aspect here is that scientists in the 1940s are at a different life stage/might just be more generally "mature" than people of a similar age/nationality/social class today. (eg most Americans back then in their late twenties probably were married and had multiple children, life expectancy at birth in the 1910s is about 50 so 30 is middle-aged, society overall was not organized as a gerontocracy, etc).

4[anonymous]1y

I think age / inexperience is contributing to whatever the hell I'm experiencing here. Not enough mentors telling them darn kids to tuck their shirts in.

Habryka [Deactivated]1y38

This is an interesting datapoint, though... just to be clear, I would not consider the Manhattan project a success on the dimension of wisdom or even positive impact.

They did sure build some powerful technology, and they also sure didn't seem to think much about whether it was good to build that powerful technology (with many of them regretting it later).

I feel like the argument of "the only other community that was working on technology of world-ending proportions, which to be clear, did end up mostly just running full steam ahead at building the world-destroyer, was also very young" is not an amazing argument against criticism of EA/AI-Safety.

huw1y14

My super rough impression here is many of the younger people on the project were the grad students of the senior researchers on the project; such an age distribution seems like it would've been really common throughout most academia if so.

In my perception, the criticism levelled against EA is different. The version I've seen people argue revolves around EA lacking the hierarchy of experience required to restrain the worst impulses of having a lot of young people in concentration. The Manhattan Project had an unusual amount of intellectual freedom for a mil... (read more)

Long-Term Future Fund: May 2023 to March 2024 Payout recommendations

Buck2y14

Note: When an earlier private version of these notes was circulated, a senior figure in technical AI safety strongly contested my description. They believe that the Anthropic SAE work is much more valuable than the independent SAE work, as both were published around the same time, but the Anthropic work provides sufficient evidence to be worth extending by other researchers, whereas the independent research was not dispositive.

For the record, if the researcher here was COI’d, eg working at Anthropic, I think you should say so, and you should also substantially discount what they said.

Linch

I agree! (They are not from Anthropic. I probably shouldn't deanonymize further). :)

Why so many “racists” at Manifest?

Buck2y27

I’d bet against that but not confident

Cullen 🔸2y70

I think many people are tricking themselves into being more intellectually charitable to Hanania than warranted.

I know relatively little about Hanania other than stuff that has been brought to my attention through EA drama and some basic “know thy enemy” reading I did on my own initiative. I feel pretty comfortable in my current judgment that his statements on race are not entitled charitable readings in cases of ambiguity.

Hanania by his own admission was deeply involved in some of the most vilely racist corners of the internet. He knows what sorts of mess... (read more)

Caruso's Quick takes

Buck2y2

I don’t think he says anything in the manifesto about why AI is going to go better if he starts a “hedge fund/think tank”.

I haven’t heard a strong case for him doing this project but it seems plausibly reasonable. My guess is I’d think it was a suboptimal choice if I heard his arguments and thought about it, but idk.

Why so many “racists” at Manifest?

Buck2y10

I'd bet that he didn't mean black people here.

Why so many “racists” at Manifest?

Buck2y48

For what it's worth, I'm 75% confident that Hanania didn't mean black people with the "animals" comment.

I think it's generally bad form to not take people at their word about the meaning of their statements, though I'm also very sympathetic to the possibility of provocateurs exploiting charity to get away with dogwhistles (and I think Hanania deserves more suspicion of this than most), so I feel mixed about you using it as an example here.

Jeff Kaufman 🔸2y28

"Didn't mean" is fuzzy in this sort of case. I'd put "he expected a good number of readers would interpret the referent of 'animals' to be 'black people' and was positive on that interpretation ending up in their minds" at more likely than not.

Why so many “racists” at Manifest?

Buck2y20

I don't think Hanania is exactly well-positioned to build support on the right; he constantly talks about how much contempt he has for conservatives.

burner2y16

I understand how it may be weird given how much he trolls them, but he is among the most influential writers on the Right.

TheAthenians

I second that. He does a pretty good job of making all sides angry.

Caruso's Quick takes

Buck2y25

He lays out the relevant part of his perspective in "The Free World Must Prevail" and "Superalignment" in his recent manifesto.

ChanaMessinger

Buck, do you have any takes on how good this seems to you / how good the arguments in the manifesto for doing this work seem to you? (No worries if not or you don't want to discuss publicly)

David Mathers's Quick takes

Buck2y21

I think it's pretty unreasonable to call him a Nazi--he'd hate Nazis, because he loves Jews and generally dislikes dumb conservatives.

I agree that he seems pretty racist.

Quick Update on Leaving the Board of EV

Buck2y7

Most importantly, it seems to me that the people in EA leadership that I felt were often the most thoughtful about these issues took a step back from EA, often because EA didn't live up to their ethical standards, or because they burned out trying to affect change and this recent period has been very stressful

Who on your list matches this description? Maybe Becca if you think she's thoughtful on these issues? But isn't that one at most?

-2

Habryka [Deactivated]

Becca, Nicole and Max all stand out as people who I think burned out trying to make things go better around FTX stuff. Also Claire leaving her position worsened my expectations of how much Open Phil will do things that seem bad. Alexander also seems substantially worse than Holden on this dimension. I think Holden was on the way out anyways, but my sense was Claire found the FTX-adjacent work very stressful and that played a role in her leaving (I don't thinks she agrees with me on many of these issues, but I nevertheless trusted her decision-making more than others in the space).

How has FTX’s collapse impacted EA?

Buck2y12

I think that one reason this isn’t done is that the people who have the best access to such metrics might not think it’s actually that important to disseminate them to the broader EA community, rather than just sharing them as necessary with the people for whom these facts are most obviously action-relevant.

PeterSlattery2y12

Yeah, I think that might be one reason it isn't done. I personally think that it is probably somewhat important for the community to understand itself better (e.g., the relative progress and growth in different interests/programs/geographies). Especially for people in the community who are community builders, recruiters, founders, etc. I also recognise that it might not be seen as priority for various reasons or risky for other reasons and I haven't thought a lot about it.

Regardless, if people who have data about the community that they don't want to... (read more)

richard_ngo's Quick takes

Buck2y22

I think you're right that my original comment was rude; I apologize. I edited my comment a bit.

I didn't mean to say that the global poverty EAs aren't interested in detailed thinking about how to do good; they definitely are, as demonstrated e.g. by GiveWell's meticulous reasoning. I've edited my comment to make it less sound like I'm saying that the global poverty EAs are dumb or uninterested in thinking.

But I do stand by the claim that you'll understand EA better if you think of "promote AMF" and "try to reduce AI x-risk" as results of two fairly differe... (read more)

NickLaing

Nice one makes much more sense now, appreciate the change a lot :), have retracted my comment now (I think it can still be read, haven't mastered the forum even after hundreds of comments...)

richard_ngo's Quick takes

Buck2y35

I don't think it makes sense to think of EA as a monolith which both promoted bednets and is enthusiastic about engaging with the kind of reasoning you're advocating here. My oversimplified model of the situation is more like:

Some EAs don't feel very persuaded by this kind of reasoning, and end up donating to global development stuff like bednets.
Some EAs are moved by this kind of reasoning, and decide not to engage with global development because this kind of reasoning suggests higher impact alternatives. They don't really spend much time thinking about h

... (read more)

richard_ngo

Makes sense, though I think that global development was enough of a focus of early EA that this type of reasoning should have been done anyway. I’m more sympathetic about it not being done after, say, 2017.

NickLaing2y32

Thanks a lot that makes sense, this comment no longer stands after the edits so have retracted really appreciate the clarification!

(I'm not sure its intentional, but this comes across as patronizing to global health folks. Saying folks "don't want to do this kind of thinking" is both harsh and wrong. It seems like you suggest that "more thinking" automatically leads people down the path of "more important" things than global health, which is absurd.

Plenty of people have done plenty of thinking through an EA lens and decided that bed nets are a great place ... (read more)

[This comment is no longer endorsed by its author]Reply

Silicon Valley’s Rabbit Hole Problem

Buck2y17

Note that L was the only example in your list which was specifically related to EA. I believe that that accusation was false. See here for previous discussion.

Catherine Low🔸2y15

The situation with person L was deeply tragic. This comment explains some of the actions taken by CEA’s Community Health team as a result of their reports.

Ebenezer Dukakis2y12

Even if most examples are unrelated to EA, if it's true that the Silicon Valley AI community has zero accountability for bad behavior, that seems like it should concern us?

EDIT: I discuss a [high uncertainty] alternative hypothesis in this comment.

Lucretia

I don’t see anything on the linked post in this comment that L’s report was false from legitimate sources.

pseudonym2y19

I would also be interested in more clarification about how EA relevant the case studies provided might be, to whatever extent this is possible without breaking confidentiality. For example:

We were pressured to sign non-disclosure agreements or “consent statements” in a manipulative “community process”.

this does not sound like the work of CEA Community health team, but it would be an important update if it was, and it would be useful to clarify if it wasn't so people don't jump to the wrong conclusions.

That being said, I think the AI community in the Bay Ar... (read more)

OP stated that L’s accusations were dismissed by the EA community.

Silicon Valley’s Rabbit Hole Problem

Buck2y6

I believe that these accusations are false. See here for previous discussion.

-2

Lucretia

I do not see valid claims that L’s report was false on the post you link, and to be totally honest, this comment is a bit of a red flag.

OP stated that L’s accusations were dismissed by the EA community. Your link doesn’t provide any proof that they’re false. So it seems like you’re proving OP right.

Linch's Quick takes

Buck2y9

Can you give some examples of other strategies you think seem better?

Linch2y13

eg, some (much lighter) investigation, followed by:

denying them power/resources if you are personally in a position to do so
talking to the offenders if you think they are corrigible and not retributive
- alternatively, talking to someone in a senior position/position of authority over the offenders who can deliver the message more sternly etc
(if nonprofit) talking to the nonprofit's board if it's not captured
(if grad student, and the problems are professional) talking to their advisor if you think the advisor's sympathetic to your concerns
(if funded by EA fol

... (read more)

Buck2y12

I think it was unhelpful to refer to “Harry Potter fanfiction” here instead of perhaps “a piece of fiction”—I don’t think it’s actually more implausible that a fanfic would be valuable to read than some other kind of fiction, and your comment ended up seeming to me like it was trying to use the dishonest rhetorical strategy of implying without argument that the work is less likely to be valuable to read because it’s a fanfic.

Linch

Adjusted for popularity or likelihood of recommendation, you might naively expect fiction that someone is presented with to be more likely to stand the test of time than fan fiction, since the selection effects are quite different.

Joseph

I think that is a fair and accurate criticism. I do view most fan fiction as fairly low quality, but even if that is true it doesn’t imply that all fan fiction is low quality. And I do think that some fiction can be used for self-improvement purposes.