Thanks for doing this! I think the most striking part of what you found is the donations to representatives who sit on the subcommittee that oversees the CFTC (i.e. the House Agriculture Subcommittee on Commodity Exchanges, Energy, and Credit), so I wanted to look into this more. From a bit of Googling:
Nitpick: doesn't the argument you made also assume that there'll be a big discontinuity right before AGI? That seems necessary for the premise about "extremely novel software" (rather than "incrementally novel software") to hold.
why they would want to suggest to these bunch of concerned EAs how to go about trying to push for the ideas that Buck disagrees with better
My guess was that Buck was hopeful that, if the post authors focus their criticisms on the cruxes of disagreement, that would help reveal flaws in his and others' thinking ("inasmuch as I'm wrong it would be great if you proved me wrong"). In other words, I'd guess he was like, "I think you're probably mistaken, but in case you're right, it'd be in both of our interests for you to convince me of that, and you'll only...
I guess I'm a bit skeptical of this, given that Buck has said this to weeatquince "I would prefer an EA Forum without your critical writing on it, because I think your critical writing has similar problems to this post (for similar reasons to the comment Rohin made here), and I think that posts like this/yours are fairly unhelpful, distracting, and unpleasant. In my opinion, it is fair game for me to make truthful comments that cause people to feel less incentivized to write posts like this one (or yours) in future".
I interpreted Buck's comment differently. His comment reads to me, not so much like "playing the man," and more like "telling the man that he might be better off playing a different game." If someone doesn't have the time to write out an in-depth response to a post that takes 84 minutes to read, but they take the time to (I'd guess largely correctly) suggest to the authors how they might better succeed at accomplishing their own goals, that seems to me like a helpful form of engagement.
This seems helpful, though I'd guess another team that's in more frequent contact with AI safety orgs could do this for significantly lower cost, since they'll be starting off with more of the needed info and contacts.
Thanks for sharing! The speakers on the podcast might not have had the time to make detailed arguments, but I find their arguments here pretty uncompelling. For example:
Thanks for writing this! I want to push back a bit. There's a big middle ground between (i) naive, unconstrained welfare maximization and (ii) putting little to no emphasis on how much good one does. I think "do good, using reasoning" is somewhat too quick to jump to (ii) while passing over intermediate options, like:
...Readers might be interested in the comments over here, especially Daniel K.'s comment:
...The only viable counterargument I've heard to this is that the government can be competent at X while being incompetent at Y, even if X is objectively harder than Y. The government is weird like that. It's big and diverse and crazy. Thus, the conclusion goes, we should still have some hope (10%?) that we can get the government to behave sanely on the topic of AGI risk, especially with warning shots, despite the evidence of it behaving incompetently on the topic of bio r
[Edit: I think the following no longer makes sense because the comment it's responding to was edited to add explanations, or maybe I had just missed those explanations in my first reading. See my other response instead.]
Thanks for this. I don't see how the new estimates incorporate the above information. (The medians for CSER, Leverhulme, and FLI seem to still be at 5 each.)
(Sorry for being a stickler here--I think it's important that readers get accurate info on how many people are working on these problems.)
Thanks for the updates!
I have it on good word that CSET has well under 10 safety-focused researchers, but fair enough if you don't want to take an internet stranger's word for things.
I'd encourage you to also re-estimate the counts for CSER, Leverhulme, and the Future of Life Institute.
Thanks for posting, seems good to know these things! I think some of the numbers for non-technical research should be substantially lower--enough that an estimate of ~55 non-technical safety researchers seems more accurate:
Thanks for posting! I'm sympathetic to the broad intuition that any one person being at the sweet spot where they make a decisive impact seems unlikely , but I'm not sold on most of the specific arguments given here.
Recall that there are decent reasons to think goal alignment is impossible - in other words, it's not a priori obvious that there's any way to declare a goal and have some other agent pursue that goal exactly as you mean it.
I don't see why this is the relevant standard. "Just" avoiding egregiously unintended behavior seems sufficient for av...
+1 on this being a relevant intuition. I'm not sure how limited these scenarios are - aren't information asymmetries and commitment problems really common?
Ah sorry, I had totally misunderstood your previous comment. (I had interpreted "multiply" very differently.) With that context, I retract my last response.
By "satisfaction" I meant high performance on its mesa-objective (insofar as it has one), though I suspect our different intuitions come from elsewhere.
it should robustly include "building copy of itself"
I think I'm still skeptical on two points:
getting the "multiply" part right is sufficient, AI will take care of the "satisfaction" part on its own
I'm struggling to articulate how confused this seems in the context of machine learning. (I think my first objection is something like: the way in which "multiply" could be specified and the way in which an AI system pursues satisfaction are very different; one could be an aspect of the AI's training process, while another is an aspect of the AI's behavior. So even if these two concepts each describe aspects of the AI system's objectives/behavior, tha...
Maybe, but is "multiply" enough to capture the goal we're talking about? "Maximize total satisfaction" seems much harder to specify (and to be robustly learned) - at least I don't know what function would map states of the world to total satisfaction.
I think this gets a lot right, though
As I am not a preference utilitarian I strongly reject this identification.
While this does seem to be part of the confusion of the original question, I'm not sure (total) preference vs. hedonic utilitarianism is actually a crux here. An AI system pursuing a simple objective wouldn't want to maximize the number of satisfied AI systems; it would just pursue its objective (which might involve relatively few copies of itself with satisfied goals). So highly capable AI systems pursuing very simple or random goals aren't only bad by hedonic utilitarian lights; they're also bad by (total) preference utilitarian lights (not to mention "common sense ethics").
my point is that, within the FAW and altpro movements, A is mentioned
Oh interesting, I wasn't aware this point came up much. Taking your word for it, I agree then that (A) shouldn't get more weight than (B) (except insofar as we have separate, non-speculative reasons to be more bullish about economic interventions).
I think you kind of changed the "latter argument" a bit here from what we were discussing before.
Sorry for the confusion--I was trying to say that alt-pro advocates often have an argument that's different (and better-grounded) than (A) a...
Thanks for the thoughtful response!
I actually think this paragraph you created is worth presenting and considering. The thing is, it's pretty much been presented already. This is, for example, roughly the story of Bruce Friedrich (founder and CEO of GFI), and maybe pretty much GFI too. And that was my story too, and might be the story of a lot of EA animal/alt-pro advocates. So if this argument is presented, why not also consider its counterpart? (what I did)
I think this is subtly off. The story I've heard from alt-pro advocates is that we should focu...
I'm not sure how much of a pain this would be implementation-wise (or stylistically), but I'd be curious to see agree/disagree voting for posts (rather than just comments). After all, arguments for having this type of voting for comments seem to roughly generalize to posts, e.g. it seems useful for readers to be able to quickly distinguish between (i) critical posts that the community tends to appreciate and agree with, and (ii) critical posts that the community tends to appreciate but disagree with.
Thanks for writing! I'm skeptical that a non-morally-motivated ban would create bad value lock-in. Most of this post's arguments for that premise seem to be just the author's speculative intuitions, given with no evidence or argument (e.g. " I also worry that using laws to capture our abolition of moral catastrophes after they become economically inviable, can create a false sense of progress [...] Always waiting for technological changes might mislead us to think that we have less obligation to improve our moral values or actions when the technological/e...
But it seems like such a narrow notion of alignment that it glosses over almost all of the really hard problems in real AI safety -- which concern the very real conflicts between the humans who will be using AI.
I very much agree these these political questions matter, and that alignment to multiple humans is conceptually pretty shaky; thanks for bringing up these issues. Still, I think some important context is that many AI safety researchers think that it's a hard, unsolved problem to just keep future powerful AI systems from causing many deaths (or do...
Thanks for the comment! I agree these are important considerations and that there's plenty my post doesn't cover. (Part of that is because I assumed the target audience of this post--technical readers of this forum--would have limited interest in governance issues and would already be inclined to think about the impacts of their work. Though maybe I'm being too optimistic with the latter assumption.)
Were there any specific misuse risks involving the tools discussed in the post that stood out to you as being especially important to consider?
Thanks for writing this. I think there are actually some pretty compelling examples of people/movements being quite successful at helping future generations (while partly trying to do so):
Maybe, I'm not sure though. Future applications that do long-term, large-scale planning seem hard to constrain much while still letting them do what they're supposed to do. (Bounded goals--if they're bounded to small-scale objectives--seem like they'd break large-scale planning, time limits seem like they'd break long-term planning, and as you mention the "don't kill people" counter would be much trickier to implement.)
I also used to be pretty skeptical about the credibility of the field. I was surprised to learn about how much mainstream, credible support AI safety concerns have received:
To counter that, let me emphasize the aspects of AI risk that are not disproven here.
Adding to this list, much of the field thinks a core challenge is making highly capable, agentic AI systems safe. But (ignoring inner alignment issues) severe constraints create safe AI systems that aren't very capable agents. (For example, if you make an AI that only considers what will happen within a time limit of 1 minute, it probably won't be very good at long-term planning. Or if you make an AI system that only pursues very small-scale goals, it won't be able to s...
(I skimmed; apologies if I missed relevant things.)
no one can bring about a dystopian future unless their ability to accomplish their goals is significantly more advanced than everyone else’s
[...] the EA community [...] is itself a substantial existential risk
This post seems to rely on the assumption that, in the absence of extremely unusual self-limits, EA's ability to accomplish its goals will somehow become significantly more advanced than those of the rest of the world combined. That's quite a strong, unusual assumption to make about any social movement--I think it'd take lots more argument to make a convincing case for it.
I'm not sure how much I agree with this / its applicability, but one argument I've heard is that, for individual decision-making and social norm-setting,
total abstinence is easier than perfect moderation
(Kind of a stretch, but I enjoyed this speech on the cultural and coordinating power of simple norms, which can be seen as a case against nuanced norms. Maybe the simplicity of some standards as individuals' principles, advocacy goals, and social norms makes them more resilient to pressure, whereas more nuanced standards might more easily fall down slip...
Fair points!
I think preference-based views fit neatly into the asymmetry.
Here I'm moving on from the original topic, but if you're interested in following this tangent--I'm not quite getting how preference-based views (specifically, person-affecting preference utilitarianism) maintain the asymmetry while avoiding (a slightly/somewhat weaker version of) "killing happy people is good."
Under "pure" person-affecting preference utilitarianism (ignoring broader pluralistic views of which this view is just one component, and also ignoring instrumental justif...
Of the experience-based asymmetric views discussed in the OP, my posts on tranquilism and suffering-focused ethics mention value pluralism and the idea that things other than experiences (i.e., preferences mostly) could also be valuable. Given these explicit mentions it seems false to claim that "these views don't easily fit into a preference-focused framework." [...] I'm not sure why you think [a certain] argument would have to be translated into a preference-focused framework.
I think this misunderstands the point I was making. I meant to highlight how...
On deterrence:
Thanks for the thoughtful reply; I've replied to many of these points here.
In short, I think you're right that Magnus doesn't explicitly assume consequentialism or hedonism. I understood him to be implicitly assuming these things because of the post's focus on creating happiness and suffering, as well as the apparent prevalence of these assumptions in the suffering-focused ethics community (e.g. the fact that it's called "suffering-focused ethics" rather than "frustration-focused ethics"). But I should have more explicitly recognized those assumptions and ...
Thanks for the thoughtful reply; I've replied to many of these points here.
On a few other ends:
Thanks for the thoughtful reply. You're right, you can avoid the implications I mentioned by adopting a preference/goal-focused framework. (I've edited my original comment to flag this; thanks for helping me recognize it.) That does resolve some problems, but I think it also breaks most of the original post's arguments, since they weren't made in (and don't easily fit into) a preference-focused framework. For example:
Thanks for writing. You're right that MacAskill doesn't address these non-obvious points, though I want to push back a bit. Several of your arguments are arguments for the view that "intrinsically positive lives do not exist," and more generally that intrinsically positive moments do not exist. Since we're talking about repugnant conclusions, readers should note that this view has some repugnant conclusions of its own.
[Edit: I stated the following criticism too generally; it only applies when one makes an additional assumption: that experiences matter, whi...
edit: I wrote this comment before I refreshed the page and I now see that these points have been raised!
Thanks for flagging that all ethical views have bullets to bite and for pointing at previous discussion of asymmetrical views!
However, I'm not really following your argument.
...Several of your arguments are arguments for the view that "intrinsically positive lives do not exist," [...] It implies that there wouldn't be anything wrong with immediately killing everyone reading this, their families, and everyone else, since this supposedly wouldn't be des
[the view that intrinsically positive lives do not exist] implies that there wouldn't be anything wrong with immediately killing everyone reading this, their families, and everyone else, since this supposedly wouldn't be destroying anything positive.
This is not true. The view that killing is bad and morally wrong can be, and has been, grounded in many ways besides reference to positive value.[1]
First, there are preference-based views according to which it would be bad and wrong to thwart preferences against being killed, even as the creation and satisfacti...
It implies that there wouldn't be anything wrong with immediately killing everyone reading this, their families, and everyone else, since this supposedly wouldn't be destroying anything positive.
That's not how many people with the views Magnus described would interpret their views.
For instance, let's take my article on tranquilism, which Magnus cites. It says this in the introduction:
...Tranquilism is not meant as a standalone moral theory, but as a way to think about well-being and the value of different experiences. Tranquilism can then serve as a buil
Thanks for the thoughtful post!
Some of the disconnect here might be semantic - my sense is people here often use "moral progress" to refer to "progress in people's moral views," while you seem to be using the term to mean both that and also other kinds of progress.
Other than that, I'd guess people might not yet be sold on how tractable and high-leverage these interventions are, especially in comparison to other interventions this community has identified. If you or others have more detailed cases to make on the tractability of any of these important proble...
Good points!
Some GiveWell charities largely benefit young children, too, but if I recall correctly, I think donations have been aimed at uses for the next year or two, so maybe only very young children would not benefit on such a person-affecting view, and this wouldn't make much difference.
Agreed that this wouldn't make much of a difference for donations, although maybe it matters a lot for some career decisions. E.g. if future people weren't ethically important, then there might be little value in starting a 4+ year academic degree to then donate to these charities.
(Tangentially, the time inconsistency of presentists' preferences seems pretty inconvenient for career planning.)
Thanks for writing - I skimmed so may have missed things, but I think these arguments have significant weaknesses, e.g.:
Thanks for posting! Tentative idea for tweaks: my intuition would be to modify the middle two branches into the following:
Rationale:
I'd consider tweaking (3) to something like, "Make sure you don't start a nuclear war based on a false alarm." The current version has imo some serious downsides:
I agree with a lot of this, although I'm not sure I see why standardized cost benefit analysis would be necessary for legitimate epistemic progress to be made? There are many empirical questions that seem important from a wide range of ethical views, and people with shared interest in these questions can work together to figure these out, while drawing their own normative conclusions. (This seems to line up with what most organizations affiliated with this community actually do--my impression is that lots more research goes into empirical questions than in...
I think you have good points around partisanship, comparative advantage, and weaknesses of some arguments on the forum. Two other thoughts:
I'm still figuring out how I want to engage on this forum; for now, I generally, tentatively prefer to not disclose personal information on here. I'd encourage readers to conservatively assume I have conflicts of interest, and to assess my comments and posts based on their merits. (My vague sense is that this is a common approach to this forum--common enough that non-disclosure doesn't imply an absence of conflicts of interest--but maybe I've misread? I'm not confident about the approach I'm taking - feel free to message me on this forum if you'd like to d...
I'm mostly sympathetic - I'd add a few caveats:
Fair! Sorry for the slow reply, I missed the comment notification earlier.
I could have been clearer in what I was trying to point at with my comment. I didn't mean to fault you for not meeting an (unmade) challenge to list all your assumptions--I agree that would be unreasonable.
Instead, I meant to suggest an object-level point: that the argument you mentioned seems pretty reliant on a controversial discontinuity assumption--enough that the argument alone (along with other, largely uncontroversial assumptions) doesn't make it "quite easy to reach extremely... (read more)