All of Mau's Comments + Replies

Fair! Sorry for the slow reply, I missed the comment notification earlier.

I could have been clearer in what I was trying to point at with my comment. I didn't mean to fault you for not meeting an (unmade) challenge to list all your assumptions--I agree that would be unreasonable.

Instead, I meant to suggest an object-level point: that the argument you mentioned seems pretty reliant on a controversial discontinuity assumption--enough that the argument alone (along with other, largely uncontroversial assumptions) doesn't make it "quite easy to reach extremely... (read more)

Thanks for doing this! I think the most striking part of what you found is the donations to representatives who sit on the subcommittee that oversees the CFTC (i.e. the House Agriculture Subcommittee on Commodity Exchanges, Energy, and Credit), so I wanted to look into this more. From a bit of Googling:

  • It looks like you're right that Rep. Delgado sits on (and is even the Chair of) this subcommittee.
  • On the other hand, it looks like Rep. Spanberger doesn't actually sit on this subcommittee, and hasn't done so since 2021. In other words, she hasn't been on
... (read more)
3
Radical Empath Ismam
1y
Thank you for this. I think you're right. I'll issue a correction.

Nitpick: doesn't the argument you made also assume that there'll be a big discontinuity right before AGI? That seems necessary for the premise about "extremely novel software" (rather than "incrementally novel software") to hold.

5
RobBensinger
1y
I do think that AGI will be developed by methods that are relatively novel. Like, I'll be quite surprised if all of the core ideas are >6 years old when we first achieve AGI, and I'll be more surprised still if all of the core ideas are >12 years old. (Though at least some of the surprise does come from the fact that my median AGI timeline is short, and that I don't expect us to build AGI by just throwing more compute and data at GPT-n.) Separately and with more confidence, I'm expecting discontinuities in the cognitive abilities of AGI. If AGI is par-human at heart surgery and physics, I predict that this will be because of "click" moments where many things suddenly fall into place at once, and new approaches and heuristics (both on the part of humans and on the part of the AI systems we build), not just because of a completely smooth, incremental, and low-impact-at-each-step improvement to the knowledge and thought-habits of GPT-3. "Superhuman AI isn't just GPT-3 but thinking faster and remembering more things" (for example) matters for things like interpretability, since if we succeed shockingly well at finding ways to reasonably thoroughly understand what GPT-3's brain is doing moment-to-moment, this is less likely to be effective for understanding what the first AGI's brain is doing moment-to-moment insofar as the first AGI is working in very new sorts of ways and doing very new sorts of things. I'm happy to add more points like these to the stew so they can be talked about. "Your list of reasons for thinking AGI risk is high didn't explicitly mention X" is a process we can continue indefinitely long if we want to, since there are always more background assumptions someone can bring up that they disagree with. (E.g., I also didn't explicitly mention "intelligence is a property of matter rather than of souls imparted into particular animal species by God", "AGI isn't thousands of years in the future", "most random goals would produce bad outcomes if optimize
Mau
1y53
25
7

why they would want to suggest to these bunch of concerned EAs how to go about trying to push for the ideas that Buck disagrees with better

My guess was that Buck was hopeful that, if the post authors focus their criticisms on the cruxes of disagreement, that would help reveal flaws in his and others' thinking ("inasmuch as I'm wrong it would be great if you proved me wrong"). In other words, I'd guess he was like, "I think you're probably mistaken, but in case you're right, it'd be in both of our interests for you to convince me of that, and you'll only... (read more)

I guess I'm a bit skeptical of this, given that Buck has said this to weeatquince "I would prefer an EA Forum without your critical writing on it, because I think your critical writing has similar problems to this post (for similar reasons to the comment Rohin made here), and I think that posts like this/yours are fairly unhelpful, distracting, and unpleasant. In my opinion, it is fair game for me to make truthful comments that cause people to feel less incentivized to write posts like this one (or yours) in future". 

Mau
1y50
28
15

I interpreted Buck's comment differently. His comment reads to me, not so much like "playing the man," and more like "telling the man that he might be better off playing a different game." If someone doesn't have the time to write out an in-depth response to a post that takes 84 minutes to read, but they take the time to (I'd guess largely correctly) suggest to the authors how they might better succeed at accomplishing their own goals, that seems to me like a helpful form of engagement.

6
GideonF
1y
Maybe your correct, and that's definitely how I interpreted it initially, but Buck's response to me gave a different impression. Maybe I'm wrong, but it just strikes me as a little strange if Buck feels they have considered these ideas and basically rejects them, why they would want to suggest to these bunch of concerned EAs how to go about trying to push for the ideas that Buck disagrees with better. Maybe I'm wrong or have misinterpreted something though, I wouldn't be surprised

This seems helpful, though I'd guess another team that's in more frequent contact with AI safety orgs could do this for significantly lower cost, since they'll be starting off with more of the needed info and contacts.

4
Linch
1y
Agreed! Other groups will be better placed. But I'm not categorically ruling this out: if nobody else appears to be on track for doing this when we're next in prioritization mode, we might revisit this issue and see whether it makes sense to prioritize it anyway.
Mau
1y12
9
1

Thanks for sharing! The speakers on the podcast might not have had the time to make detailed arguments, but I find their arguments here pretty uncompelling. For example:

  • They claim that "many belief systems they have a way of segregating and limiting the impact of the most hardcore believers." But (at least from skimming) their evidence for this seems to be just the example of monastic traditions.
  • A speaker claims that "the leaders who take ideas seriously don't necessarily have a great track record." But they just provide a few cherry-picked (and dubious
... (read more)
2
Habryka
1y
I share this impression of the actual data points being used feeling pretty flimsy
Mau
1y38
11
1

Thanks for writing this! I want to push back a bit. There's a big middle ground between (i) naive, unconstrained welfare maximization and (ii) putting little to no emphasis on how much good one does. I think "do good, using reasoning" is somewhat too quick to jump to (ii) while passing over intermediate options, like:

  • "Do lots of good, using reasoning" (roughly as in this post)
  • "be a good citizen, while ambitiously working towards a better world" (as in this post)
  • "maximize good under constraints or with constraints incorporated into the notion of goodnes
... (read more)
4
lincolnq
1y
Thanks for writing! To be clear, I don't think we as a community should be scope insensitive. But here's the FAQ I would write about this... * Q: does EA mean I should only work on the most important cause areas? * no! being in EA means you choose to do good with your life, and think about those choices. We hope that you'll choose to improve your life / career / donations in more-altruistic ways, and we might talk with you and discover ideas for making your altruistic life even better. * Q: does EA mean I should do or support [crazy thing X] to improve the world? * Probably not: if it sounds crazy to you, trust your reasoning! However, EA is a big umbrella and we're nurturing lots of weird ideas; some ideas that seem crazy to one person might make sense to another. We're committed to reasoning about ideas that might actually help the world even if they sound absurd at first. Contribute to this reasoning process and you might well make a big impact. * Q: does ea's "big umbrella" mean that I should avoid criticizing people for not reaching their potential or doing as much good as I think they could do? * This is very nuanced! You'll see lots of internal feedback and criticism in EA spaces. We do have a norm against loudly and unsolicitedly critiquing people's plans for not doing enough good, but this is overridden in cases where a) the person has asked for the feedback first or b) the person making the critique has a deep and nuanced understanding of the existing plan, as well as a strong relationship with the recipient of the feedback. Our advice, if you see something you want to critique, is to ask if they want feedback before offering. * Q: what about widely-recommended canonical public posts listing EA priorities, implicitly condemning anything that's not on the priority list? * ...yeah this feels like a big part of the problem to me. I think it makes sense to write up a standard disclaimer for such posts, saying "there's lots of good things not
5
Vael Gates
1y
Not directly relevant to the OP, but another post covering research taste: An Opinionated Guide to ML Research (also see Rohin Shah's advice about PhD programs (search "Q. What skills will I learn from a PhD?") for some commentary.

Readers might be interested in the comments over here, especially Daniel K.'s comment:

The only viable counterargument I've heard to this is that the government can be competent at X while being incompetent at Y, even if X is objectively harder than Y. The government is weird like that. It's big and diverse and crazy. Thus, the conclusion goes, we should still have some hope (10%?) that we can get the government to behave sanely on the topic of AGI risk, especially with warning shots, despite the evidence of it behaving incompetently on the topic of bio r

... (read more)

[Edit: I think the following no longer makes sense because the comment it's responding to was edited to add explanations, or maybe I had just missed those explanations in my first reading. See my other response instead.]

Thanks for this. I don't see how the new estimates incorporate the above information. (The medians for CSER, Leverhulme, and FLI seem to still be at 5 each.)

(Sorry for being a stickler here--I think it's important that readers get accurate info on how many people are working on these problems.)

3
Stephen McAleese
2y
New estimates: CSER: 2-5-10 -> 2-3-7 FLI: 5-5-20 -> 3-4-6 Levelhume:  2-5-15 -> 3-4-10

Thanks for the updates!

I have it on good word that CSET has well under 10 safety-focused researchers, but fair enough if you don't want to take an internet stranger's word for things.

I'd encourage you to also re-estimate the counts for CSER, Leverhulme, and the Future of Life Institute.

  • CSER's list of team members related to AI lists many affiliates, advisors, and co-founders but only ~3 research staff.
  • The Future of Life Institute seems more focused on policy and field-building than on research; they don't even have a research section on their website. T
... (read more)
1
Stephen McAleese
2y
I re-estimated the number of researchers in these organizations and the edits are shown in the 'EDITS' comment below. Copied from the EDITS comment: - CSER: 5-5-10 -> 2-5-15 - FLI: 5-5-20 -> 3-5-15 - Levelhume Centre: 5-10-70 (Low confidence) -> 2-5-15 (Medium confidence) My counts for CSER: - full-time researchers: 3 - research affiliates: 4 FLI: counted 5 people working on AI policy and governance.  Levelhume Centre: - 7 senior research fellows - 14 research fellows Many of them work at other organizations. I think 5 is a good conservative estimate. New footnote for the 'Other' row in the non-technical list of researchers (estimate is 10): "There are about 45 research profile on Google Scholar with the 'AI governance' tag. I counted about 8 researchers who weren't at the other organizations listed."  

Thanks for posting, seems good to know these things! I think some of the numbers for non-technical research should be substantially lower--enough that an estimate of ~55 non-technical safety researchers seems more accurate:

  • CSET isn't focused on AI safety; maybe you could count a few of their researchers (rather than 10).
  • I think SERI and BERI have 0 full-time non-technical research staff (rather than 10 and 5).
  • As far as I'm aware, the Leverhulme Centre for the Future of Intelligence + CSER only have at most a few non-technical researchers in total focu
... (read more)
1
Stephen McAleese
2y
I re-estimated counts for many of the non-technical organizations and here are my conclusions: * I didn't change the CSET estimate (10) because there seems to be a core group of about 5 researchers there and many others (20-30). Their productivity also seems to be high: I counted over 20 publications so far this year though it seems like only about half of them are related to AI governance (list of publications). * I deleted BERI and SERI from the list because they don't seem to have any full-time researchers. * Epoch:  decreased estimate from 10 to 4. * Good AI seems to be more technical than non-technical (todo).
1
Stephen McAleese
2y
Thanks for the information! Your estimate seems more accurate than mine. In the case of Epoch, I would count every part-time employee as roughly half a full-time employee to avoid underestimating their productivity.
Mau
2y11
1
0

Thanks for posting! I'm sympathetic to the broad intuition that any one person being at the sweet spot where they make a decisive impact seems unlikely , but I'm not sold on most of the specific arguments given here.

Recall that there are decent reasons to think goal alignment is impossible - in other words, it's not a priori obvious that there's any way to declare a goal and have some other agent pursue that goal exactly as you mean it.

I don't see why this is the relevant standard. "Just" avoiding egregiously unintended behavior seems sufficient for av... (read more)

2
Justis
2y
Yeah, I share the view that the "Recalls" are the weakest part -- I mostly was trying to get my fuzzy, accumulated-over-many-years vague sense of "whoa no we're being way too confident about this" into a more postable form. Seeing your criticisms I think the main issue is a little bit of a Motte-and-Bailey sort of thing where I'm kind of responding to a Yudkowskian model, but smuggling in a more moderate perspective's odds (ie. Yudkowsky thinks we need to get it right on the first try, but Grace and MacAskill may be agnostic there). I may think more about this! I do think there's something there sort of between the parts you're quoting, by which I mean yes, we could get agreement to a narrower standard than solving ethics, but even just making ethical progress at all, or coming up with standards that go anywhere good/predictable politically seems hard. Like, the political dimension and the technical/problem specification dimensions both seem super hard in a way where we'd have to trust ourselves to be extremely competent across both dimensions, and our actual testable experiments against either outcome are mostly a wash (ie. we can't get a US congressperson elected yet, or get affordable lab-grown meat on grocery store shelves, so doing harder versions of both at once seems...I dunno, might hedge my portfolio far beyond that!).
Mau
2y10
4
0

+1 on this being a relevant intuition. I'm not sure how limited these scenarios are - aren't information asymmetries and commitment problems really common?

3
mako yass
8mo
Today, somewhat, but that's just because human brains can't prove the state of their beliefs or share specifications with each other (ie, humans can lie about anything). There is no reason for artificial brains to have these limitations, and any trend towards communal/social factors in intelligence, or self-reflection (which is required for recursive self-improvement), then it's actively costly to be cognitively opaque.
2
Linch
2y
.
2
Linch
2y
I agree that they're really common in the current world. I was originally thinking that this might become substantially common in multipolar AGI scenarios (because future AIs may have better trust and commitment mechanisms than current humans do). Upon brief reflection, I think my original comment was overly concise and not very substantiated.

Ah sorry, I had totally misunderstood your previous comment. (I had interpreted "multiply" very differently.) With that context, I retract my last response.

By "satisfaction" I meant high performance on its mesa-objective (insofar as it has one), though I suspect our different intuitions come from elsewhere.

it should robustly include "building copy of itself"

I think I'm still skeptical on two points:

  • Whether this is significantly easier than other complex goals
    • (The "robustly" part seems hard.)
  • Whether this actually leads to a near-best outcome acco
... (read more)
1
Alex P
2y
>By "satisfaction" I meant high performance on its mesa-objective Yeah, I'd agree with this definition. I don't necessarily agree with your two points of skepticism, for the first one I've already mentioned my reasons, for the second one it's true in principle but it seems almost anything an AI would learn semi-accidentally is going to be much simpler and more intrinsically consistent than human values. But low confidence on both and in any case that's kind of beyond the point, I was mostly trying to understand your perspective on what utility is.

getting the "multiply" part right is sufficient, AI will take care of the "satisfaction" part on its own

I'm struggling to articulate how confused this seems in the context of machine learning. (I think my first objection is something like: the way in which "multiply" could be specified and the way in which an AI system pursues satisfaction are very different; one could be an aspect of the AI's training process, while another is an aspect of the AI's behavior. So even if these two concepts each describe aspects of the AI system's objectives/behavior, tha... (read more)

[This comment is no longer endorsed by its author]Reply
1
Alex P
2y
I am familiar with the basics of ML and the concept of mesa-optimizers. "Building copies of itself" (i.e. multiply) is an optimization goal you'd have to specifically train into the system, I don't argue with that, I just think it's a simple and "natural" (in the sense it aligns reasonably well with instrumental convergence) goal that you can robustly train it comparatively easily. "Satisfaction" however, is not a term that I've met in ML or mesa-optimizers context, and I think the confusion comes from us mapping this term differently onto these domains. In my view, "satisfaction" roughly corresponds to "loss function minimization" in the ML terminology - the lower an AIs loss function, the higher satisfaction it "experiences" (literally or metaphorically, depending on  the kind of AI). Since any AI [built under the modern paradigm] is already working to minimize its own loss function, whatever that happened to be, we wouldn't need to care much about the exact shape of the loss function it learns, except that it should robustly include "building copy of itself". And since we're presumably talking about a super-human AIs here, they would be very good at minimizing that loss function. So e.g. they can have some stupid goal like "maximize paperclips & build copies of self", they'll convert the universe to some mix of paperclips and AIs and experience extremely high satisfaction about it. But you seem to be meaning something very different when you say "satisfaction"? Do you mind stating explicitly what it is?

I can't, but I'm not sure I see your point?

1
Alex P
2y
My point is, getting the "multiply" part right is sufficient, AI will take care of the "satisfaction" part on its own, especially given that it's able to reprogram itself. This assumes "[perceived] goal achievement" == "satisfaction" (aka utility), which was my assumption all along, but apparently is only true under preference utilitarianism.

Maybe, but is "multiply" enough to capture the goal we're talking about? "Maximize total satisfaction" seems much harder to specify (and to be robustly learned) - at least I don't know what function would map states of the world to total satisfaction.

1
Alex P
2y
Can you, um, coherently imagine an agent that does not try to achieve its own goals (assuming it has no conflicting goals)?

I think this gets a lot right, though

As I am not a preference utilitarian I strongly reject this identification.

While this does seem to be part of the confusion of the original question, I'm not sure (total) preference vs. hedonic utilitarianism is actually a crux here. An AI system pursuing a simple objective wouldn't want to maximize the number of satisfied AI systems; it would just pursue its objective (which might involve relatively few copies of itself with satisfied goals). So highly capable AI systems pursuing very simple or random goals aren't only bad by hedonic utilitarian lights; they're also bad by (total) preference utilitarian lights (not to mention "common sense ethics").

1
Alex P
2y
That's true, but I think robustly embedding a goal of "multiply" is much easier than actual alignment. You can express it mathematically, you can use evolution, etc.   [To reiterate, I'm not advocating for any of this, I think any moral system that labels "humans replaced by AIs" as an acceptable outcome is a broken one]

my point is that, within the FAW and altpro movements, A is mentioned

Oh interesting, I wasn't aware this point came up much. Taking your word for it, I agree then that (A) shouldn't get more weight than (B) (except insofar as we have separate, non-speculative reasons to be more bullish about economic interventions).

I think you kind of changed the "latter argument" a bit here from what we were discussing before.

Sorry for the confusion--I was trying to say that alt-pro advocates often have an argument that's different (and better-grounded) than (A) a... (read more)

1
Fai
2y
Voted agree! I think we are gaining understanding, and maybe converging on our views a bit.  Also, I want to mention that I have shifted quite a bit from my worry I wrote in this post, so much that I actually updated some parts of it. My high level takeaway now is that we SHOULD keep up, probably speed up alt-pro (maybe particularly CM), but at roughly the point that alt-pro replaced 70-80% of factory farming, we should seriously consider putting much more effort (than now) on moral and legal advocacy.  Thank you everyone for the discussion!

Thanks for the thoughtful response!

I actually think this paragraph you created is worth presenting and considering. The thing is, it's pretty much been presented already. This is, for example, roughly the story of Bruce Friedrich (founder and CEO of GFI), and maybe pretty much GFI too. And that was my story too, and might be the story of a lot of EA animal/alt-pro advocates. So if this argument is presented, why not also consider its counterpart? (what I did)

I think this is subtly off. The story I've heard from alt-pro advocates is that we should focu... (read more)

1
Fai
2y
I strong-up-voted this for the effort to clarify things while the post is no longer on the frontpage.  I think I still have reservations. You tried to point out the story that points to historical evidence about what worked and what did not.  But meta discussions about what kind of work motivates (though I believe they don't talk about misleading) advocates being effective and sustainable (not burning out) is a constant topic that goes on in pretty much all annual meetings/retreats of FAW advocacy groups. Street advocacy/education advocacy, or just any moral advocacy that is not working, is being discussed as a reason to move away BOTH because of direct effectiveness and how much we can stay motivated. And the reverse is said for what's working. And even if those were mentioned, I don't think it is possible that each of us, as individuals, wasn't affected by such motivation/frustrations in choosing our career, for instance, I was. Also, as a side story, I heard that people left some FAW-related orgs for overrepresenting/overmotivating, and I tend to agree with them about their judgment. So maybe there is actually some active misleading. So in a sense, "speculative argument about how we might inspire or mislead future advocates" IS happening, at least the "inspire" part (and I suspect that the misleading part is also there, just not framed this way).   I think you kind of changed the "latter argument"  a bit here from what we were discussing before. Copying things over, it was A:"economic changes that drive moral progress will inspire and inform future advocates to take pragmatic approaches that actually work well rather than engaging in endless but ineffective moral grandstanding;" And B: "always waiting for moral progress might mislead us to think we have less obligation to improve economic incentives" And my point is that, within the FAW and altpro movements, A is mentioned, often as a point for advocacy sustainability and self-care. B is also mentioned, b
Mau
2y15
8
0

I'm not sure how much of a pain this would be implementation-wise (or stylistically), but I'd be curious to see agree/disagree voting for posts (rather than just comments). After all, arguments for having this type of voting for comments seem to roughly generalize to posts, e.g. it seems useful for readers to be able to quickly distinguish between (i) critical posts that the community tends to appreciate and agree with, and (ii) critical posts that the community tends to appreciate but disagree with.

1
Ben Stewart
2y
Strong agree - there's plenty of posts that I think are rigorous, well-written, interesting etc., but disagree with their conclusion or general stance. It might also offer a more useful (and maybe less spicy)'sort by controversial' function, where you can see posts that are highly upvoted but torn on agreement.

Thanks for writing! I'm skeptical that a non-morally-motivated ban would create bad value lock-in. Most of this post's arguments for that premise seem to be just the author's speculative intuitions, given with no evidence or argument (e.g. " I also worry that using laws to capture our abolition of moral catastrophes after they become economically inviable, can create a false sense of progress [...] Always waiting for technological changes might mislead us to think that we have less obligation to improve our moral values or actions when the technological/e... (read more)

5
Fai
2y
Yes, not doing research, and instead throwing my ideas out and make it a crowdsourcing of ideas and argumentation, is how I am doing "research" on this topic, I guess. I am still pretty convinced by this particular point I raised though. This is not my intuition only. For example, some political thinkers and philosophers, such as Hobbes, the Legal School in China, and many more, believe that humans pretty much agreed to not kill each others all the time because of laws or authority. But people, not just now but also in the past, seemed to be very confident that not murdering other humans is a moral intuition or moral progress we had collectively. I tend to agree. And to a certain extent throwing these intuitions out feels bad. But I do have a push back. If there are too many complications I might miss in my worries (and btw I am more express doubts over the standard predictions farmed animal and alt-pro advocates are making, than making predictions myself), then the same doubt can be casted against farmed animal welfare and alt-pro advocates for not thinking about these messy complications. So yes, naively taking my worries to make predictions is unreliable, but not considering the worries I threw out  just because they are messy seems so too. I actually think this paragraph you created is worth presenting and considering. The thing is, it's pretty much been presented already. This is, for example, roughly the story of  Bruce Friedrich (founder and CEO of GFI), and maybe pretty much GFI too. And that was my story too, and might be the story of a lot of EA animal/alt-pro advocates. So if this argument is presented, why not also consider its counterpart? (what I did) Great to know! Thanks. Might update my views on the topic again.

But it seems like such a narrow notion of alignment that it glosses over almost all of the really hard problems in real AI safety -- which concern the very real conflicts between the humans who will be using AI.

I very much agree these these political questions matter, and that alignment to multiple humans is conceptually pretty shaky; thanks for bringing up these issues. Still, I think some important context is that many AI safety researchers think that it's a hard, unsolved problem to just keep future powerful AI systems from causing many deaths (or do... (read more)

Thanks for the comment! I agree these are important considerations and that there's plenty my post doesn't cover. (Part of that is because I assumed the target audience of this post--technical readers of this forum--would have limited interest in governance issues and would already be inclined to think about the impacts of their work. Though maybe I'm being too optimistic with the latter assumption.)

Were there any specific misuse risks involving the tools discussed in the post that stood out to you as being especially important to consider?

Mau
2y37
0
0

Thanks for writing this. I think there are actually some pretty compelling examples of people/movements being quite successful at helping future generations (while partly trying to do so):

  • Some sources suggest that Lincoln had long-term motivations for permanently abolishing slavery, saying, "The abolition of slavery by constitutional provision settles the fate, for all coming time, not only of the millions now in bondage, but of unborn millions to come--a measure of such importance that these two votes must be procured." Looking back now, abolition still
... (read more)
6
zdgroff
2y
And worth noting that Ben Franklin was involved in the constitution, so at least some of his longtermist time seems to have been well spent.
8
ColdButtonIssues
2y
Thanks for the counterexamples! I'm trying to think of a way to get a fair example: Coding party manifestos by attention to long-term future and trying to rate their success in office? I'm really unsure.

Maybe, I'm not sure though. Future applications that do long-term, large-scale planning seem hard to constrain much while still letting them do what they're supposed to do. (Bounded goals--if they're bounded to small-scale objectives--seem like they'd break large-scale planning, time limits seem like they'd break long-term planning, and as you mention the "don't kill people" counter would be much trickier to implement.)

1
titotal
2y
That's a fair perspective. One last thing I'll note is that even seemingly permissive constraints can make a huge difference from the perspective of the AI utility calculus. If I ask it to maximise paperclips, then the upper utility bound is defined by the amount of matter in the universe. Capping utility at a trillion paperclips doesn't affect us much (too many would flood the market anyway), but it reduces the expected utility of an AI takeover by like 50 orders of magnitude. Putting in a time limit, even if it's like 100 years, would have the same effect. Seems like a no-brainer. 
Mau
2y22
0
0

I also used to be pretty skeptical about the credibility of the field. I was surprised to learn about how much mainstream, credible support AI safety concerns have received:

  • Multiple leading AI labs have large (e.g. 30-person) teams of researchers dedicated to AI alignment.
    • They sometimes publish statements like, "Unaligned AGI could pose substantial risks to humanity and solving the AGI alignment problem could be so difficult that it will require all of humanity to work together. "
  • Key findings that are central to concerns over AI risk have been accep
... (read more)
1
RationalHippy
5mo
I would be curious to know if your beliefs have been updated in light of the recent developments?
4
fergusq
2y
Thank you for these references, I'll take a close look on them. I'll write a new comment if I have any thoughts after going through them. Before having read them, I want to say that I'm interested in research about risk estimation and AI progress forecasting. General research about possible AI risks without assigning them any probabilities is not very useful in determining if a threat is relevant. If anyone has papers specifically on that topic, I'm very interested in reading them too.

To counter that, let me emphasize the aspects of AI risk that are not disproven here.

Adding to this list, much of the field thinks a core challenge is making highly capable, agentic AI systems safe. But (ignoring inner alignment issues) severe constraints create safe AI systems that aren't very capable agents. (For example, if you make an AI that only considers what will happen within a time limit of 1 minute, it probably won't be very good at long-term planning. Or if you make an AI system that only pursues very small-scale goals, it won't be able to s... (read more)

1
titotal
2y
 It's clear that not every constraint will work for every application, but I reckon every application will have at least some constraints that will drastically drop risk  I definitely agree that competitiveness is important, but remember that it's not just about competitiveness for a specific task, but competitiveness at pleasing AI developers. There's a large incentive for people not to build runaway murder machines!  And even if a company doesn't believe in Ai x-risk, it still has to worry about lawsuits, regulations etc for lesser accidents.  I think the majority of developers can be persuaded or forced to put some constraints on,  as long as they aren't excessively onerous. 

(I skimmed; apologies if I missed relevant things.)

no one can bring about a dystopian future unless their ability to accomplish their goals is significantly more advanced than everyone else’s

[...] the EA community [...] is itself a substantial existential risk

This post seems to rely on the assumption that, in the absence of extremely unusual self-limits, EA's ability to accomplish its goals will somehow become significantly more advanced than those of the rest of the world combined. That's quite a strong, unusual assumption to make about any social movement--I think it'd take lots more argument to make a convincing case for it.

2
Obasi Shaw
2y
(No worries for skimming, it's long!) Yeah, I guess there are two primary points that I'm trying to make in this post: 1. That EA wants to become powerful enough to reshape the world in a way significant enough to feel dystopian to most humans (e.g. the contraception/eugenics/nuclear war examples), and that's why people are intuitively averse to it. 2. That there is a significant risk that EA will become powerful enough to reshape the world in a way significant enough to feel dystopian to most humans. The second point is what you're pushing back on, though note that I reframed it away from "significantly more advanced than those of the rest of the world combined" to "powerful enough to reshape the world in a way significant enough to feel dystopian to most humans." I think this is a more accurate framing because it isn't at all likely that EA (or any other maximizer) will end up in a "me against the rest of the world combined" scenario before bringing about dystopian outcomes. If the world does manage to coordinate well enough to rally all of its resources against a maximizer, that would almost certainly only happen long after the maximizer has caused untold damage to humanity. Even now, climate change is a very clear existential risk causing tons of damage, and yet we still haven't really managed to rally the world's resources against it. (And if you're willing to entertain the idea that capitalism is emergent AI, it's easy to argue that climate change is just the embodiment of the capitalism AI's x-risk, and yet humanity is still incredibly resistant to efforts to fight it.) Anyway, my point is that EA doesn't need to be more powerful than anything else in the world, just powerful enough to create dystopian outcomes, and powerful enough to overcome resistance when it tries to. I used the contraception/eugenics/nuclear-war examples because they demonstrate that it's relatively easy to start creating dystopian outcomes, and although EA doesn't have the type of

I'm not sure how much I agree with this / its applicability, but one argument I've heard is that, for individual decision-making and social norm-setting,

total abstinence is easier than perfect moderation

(Kind of a stretch, but I enjoyed this speech on the cultural and coordinating power of simple norms, which can be seen as a case against nuanced norms. Maybe the simplicity of some standards as individuals' principles, advocacy goals, and social norms makes them more resilient to pressure, whereas more nuanced standards might more easily fall down slip... (read more)

1
edwardmdruce
2y
Thanks for replying, Mauricio. Interesting idea to maximise persuasion/streamline argument. But I'm not sure I buy this, personally, in this context. I think something as complex as food systems requires a nuanced response. This actually brings up a further notable point around food resilience. From the book: 'it’s clear that the more diverse and complex an ecosystem is, the healthier and more resilient it is. 'When someone argues that all meat is “evil,” decentralized, regional, and regenerative food systems, which offer both better quality of life for animals and more sustainable food for humans, are excluded from the discussion. From a globalization perspective, this is also a challenging ethical position as the folks making the case to remove all animal inputs in the global food system are pushing for food policies that would destroy the food systems in developing countries. 'A truly resilient food system requires as much life as possible, and this means animals and plants... Encourage this system to be as diversified and resilient as possible. 'Different regions have area-specific ecosystems that can provide different foods. In some areas it may make more sense to raise camels or goats, depending on what the landscape can provide. When we see regions feeding themselves instead of relying on outside food, we generally see more resilience.'

Fair points!

I think preference-based views fit neatly into the asymmetry.

Here I'm moving on from the original topic, but if you're interested in following this tangent--I'm not quite getting how preference-based views (specifically, person-affecting preference utilitarianism) maintain the asymmetry while avoiding (a slightly/somewhat weaker version of) "killing happy people is good."

Under "pure" person-affecting preference utilitarianism (ignoring broader pluralistic views of which this view is just one component, and also ignoring instrumental justif... (read more)

2
Lukas_Gloor
2y
I would answer "No." The preference against being killed is as strong as the happy person wants it to be. If they have a strong preference against being killed then the preference frustration from being killed would be lot worse than the preference frustration from an unhappy decade or two – it depends how the person herself would want to make these choices. I haven't worked this out as a formal theory but here are some thoughts on how I'd think about "preferences." (The post I linked to primarily focuses on cases where people have well-specified preferences/goals. Many people will have under-defined preferences and preference utilitarians would also want to have a way to deal with these cases. One way to deal with under-defined preferences could be "fill in the gaps with what's good on our experience-focused account of what matters.")

Of the experience-based asymmetric views discussed in the OP, my posts on tranquilism and suffering-focused ethics mention value pluralism and the idea that things other than experiences (i.e., preferences mostly) could also be valuable. Given these explicit mentions it seems false to claim that "these views don't easily fit into a preference-focused framework." [...] I'm not sure why you think [a certain] argument would have to be translated into a preference-focused framework.

I think this misunderstands the point I was making. I meant to highlight how... (read more)

4
Lukas_Gloor
2y
Thanks for elaborating! I agree I misunderstood your point here. (I think preference-based views fit neatly into the asymmetry. For instance, Peter Singer initially weakly defended an asymmetric view in Practical Ethics, as arguably the most popular exponent of preference utilitarianism at the time. He only changed his view on population ethics once he became a hedonist. I don't think I'm even aware of a text that explicitly defends preference-based totalism. By contrast, there are several texts defending asymmetric preference-based views: Benatar, Fehige, Frick, younger version of Singer.) Or that “(intrinsically) good things” don’t have to be a fixed component in our “ontology” (in how we conceptualize the philosophical option space). Or, relatedly, that the formula “maximize goods minus bads” isn’t the only way to approach (population) ethics. Not because it's conceptually obvious that specific states of the world aren't worthy of taking serious effort (and even risks, if necessary) to bring about. Instead, because it's questionable to assume that "good states" are intrinsically good, that we should bring them about regardless of circumstances, independently of people’s interests/goals. I agree that we’re mainly in agreement. To summarize the thread, I think we’ve kept discussing because we both felt like the other party was presenting a slightly unfair summary of how many views a specific criticism applies or doesn’t apply to (or applies “easily” vs. “applies only with some additional, non-obvious assumptions”). I still feel a bit like that now, so I want to flag that out of all the citations from the OP, the NU FAQ is really the only one where it’s straightforward to say that one of the two views within the text – NHU but not NIPU – implies that it would (on some level, before other caveats) be good to kill people against their will (as you claimed in your original comment). From further discussion, I then gathered that you probably meant that specific arg
Answer by MauAug 26, 202214
0
0

On deterrence:

  • In the context of massive nuclear attacks, why isn't the danger of nuclear winter widely seen as making nuclear retaliation redundant (as Ellsberg suggested on another 80K podcast episode)?
  • The US threatens to retaliate (with nukes) against anyone who nukes certain US allies--how credible is this threat, and why?
    • Part of how the US tries to make this threat more credible is by sharing nukes with some of its allies. How does this sharing work? Does the US share nukes in such a way that, in a crisis, a non-nuclear host country could easily s
... (read more)
9
Linch
2y
I think something that is not at all obvious from the outside is that in my estimate, at any given time, there is usually <2 FTE EA researchers who are thinking seriously about nuclear risk strategy from a longtermist angle (not counting advocacy, very junior trainee SERI/CERI researchers, people trying to get into policy position, scattershot work by grantmakers opportunistically evaluating nuclear grants, neartermist work, etc).

Thanks for the thoughtful reply; I've replied to many of these points here.

In short, I think you're right that Magnus doesn't explicitly assume consequentialism or hedonism. I understood him to be implicitly assuming these things because of the post's focus on creating happiness and suffering, as well as the apparent prevalence of these assumptions in the suffering-focused ethics community (e.g. the fact that it's called "suffering-focused ethics" rather than "frustration-focused ethics"). But I should have more explicitly recognized those assumptions and ... (read more)

Thanks for the thoughtful reply; I've replied to many of these points here.

On a few other ends:

  • I agree that strong negative utilitarian views can be highly purposeful and compassionate. By "semi-nihilistic" I was referring to how some of these views also devalue much (by some counts, half) of what others value. [Edit: Admittedly, many pluralists could say the same to pure classical utilitarians.]
  • I agree classical utilitarianism also has bullets to bite (though many of these look like they're appealing to our intuitions in scenarios where we should expec
... (read more)

Thanks for the thoughtful reply. You're right, you can avoid the implications I mentioned by adopting a preference/goal-focused framework. (I've edited my original comment to flag this; thanks for helping me recognize it.) That does resolve some problems, but I think it also breaks most of the original post's arguments, since they weren't made in (and don't easily fit into) a preference-focused framework. For example:

  • The post argues that making happy people isn't good and making miserable people is bad, because creating happiness isn't good and creating
... (read more)
8
Lukas_Gloor
2y
My impression of the OP's primary point was that asymmetric views are under-discussed. Many asymmetric views are preference-based and this is mentioned in the OP (e.g., the link to Anti-frustrationism or mention of Benatar). Of the experience-based asymmetric views discussed in the OP, my posts on tranquilism and suffering-focused ethics mention value pluralism and the idea that things other than experiences (i.e., preferences mostly) could also be valuable. Given these explicit mentions it seems false to claim that "these views don't easily fit into a preference-focused framework." Probably similarly, the OP links to posts by Teo Ajantaival which I've only skimmed but there's a lengthy and nuanced-seeming discussion on why minimalist axiologies, properly construed, don't have the implications you ascribed to them. The NU FAQ is a bit more single-minded in its style/approach, but on the question "Does negative utilitarianism solve ethics" it says "ethics is nothing that can be 'solved.'" This at least tones down the fanaticism a bit and opens up options to incorporate other principles or other perspectives. (Also, it contains an entire section on NIPU – negative idealized preference utilitarianism. So, that may count as another preference-based view alluded in the OP, since the NU FAQ doesn't say whether it finds N(H)U or NIPU "more convincing.") I'm not sure why you think the argument would have to be translated into a preference-focused framework. In my previous comment I wanted to say the following: (1) The OP mentions that asymmetric positions are underappreciated and cites some examples, including Anti-Frustrationism, which is (already) a preference-based view. (2) While the OP does discuss experience-focused views that say nothing is of intrinsic value, those views are compatible with a pluralistic conception of "ethics/morality" where preferences could matter too. Therefore, killing people against their will to reduce suffering isn't a clear implication o

Thanks for writing. You're right that MacAskill doesn't address these non-obvious points, though I want to push back a bit. Several of your arguments are arguments for the view that "intrinsically positive lives do not exist," and more generally that intrinsically positive moments do not exist. Since we're talking about repugnant conclusions, readers should note that this view has some repugnant conclusions of its own.

[Edit: I stated the following criticism too generally; it only applies when one makes an additional assumption: that experiences matter, whi... (read more)

edit: I wrote this comment before I refreshed the page and I now see that these points have been raised!

Thanks for flagging that all ethical views have bullets to bite and for pointing at previous discussion of asymmetrical views!

However, I'm not really following your argument.

Several of your arguments are arguments for the view that "intrinsically positive lives do not exist,"  [...] It implies that there wouldn't be anything wrong with immediately killing everyone reading this, their families, and everyone else, since this supposedly wouldn't be des

... (read more)

[the view that intrinsically positive lives do not exist] implies that there wouldn't be anything wrong with immediately killing everyone reading this, their families, and everyone else, since this supposedly wouldn't be destroying anything positive.

This is not true. The view that killing is bad and morally wrong can be, and has been, grounded in many ways besides reference to positive value.[1]

First, there are preference-based views according to which it would be bad and wrong to thwart preferences against being killed, even as the creation and satisfacti... (read more)

It implies that there wouldn't be anything wrong with immediately killing everyone reading this, their families, and everyone else, since this supposedly wouldn't be destroying anything positive.

That's not how many people with the views Magnus described would interpret their views.

For instance, let's take my article on tranquilism, which Magnus cites. It says this in the introduction:

Tranquilism is not meant as a standalone moral theory, but as a way to think about well-being and the value of different experiences. Tranquilism can then serve as a buil

... (read more)

Thanks for the thoughtful post!

Some of the disconnect here might be semantic - my sense is people here often use "moral progress" to refer to "progress in people's moral views," while you seem to be using the term to mean both that and also other kinds of progress.

Other than that, I'd guess people might not yet be sold on how tractable and high-leverage these interventions are, especially in comparison to other interventions this community has identified. If you or others have more detailed cases to make on the tractability of any of these important proble... (read more)

8
jasoncrawford
2y
I can understand not prioritizing these issues for grant-making, because of tractability. But if something is highly important, and no one is making progress on it, shouldn't there at least be a lot of discussion about it, even if we don't yet see tractable approaches? Like, shouldn't there be energy in trying to find tractability? That seems missing, which makes me think that the issues are underrated in terms of importance.

Good points!

Some GiveWell charities largely benefit young children, too, but if I recall correctly, I think donations have been aimed at uses for the next year or two, so maybe only very young children would not benefit on such a person-affecting view, and this wouldn't make much difference.

Agreed that this wouldn't make much of a difference for donations, although maybe it matters a lot for some career decisions. E.g. if future people weren't ethically important, then there might be little value in starting a 4+ year academic degree to then donate to these charities.

(Tangentially, the time inconsistency of presentists' preferences seems pretty inconvenient for career planning.)

4
MichaelStJules
2y
I think this could make a difference for careers and education, but I'd guess not 10x in terms of cost-effectiveness of the first donations post-graduation. Most EAs will probably have already started undergraduate degrees by the time they first get into EA. There are also still benefits for the parents of children who would die, in preventing their grief and possibly economic benefits. People over 5 years old still have deaths prevented by AMF, just a few times less per dollar spent, iirc. I'd guess few people would actually endorse presentism in particular, though.
Mau
2y12
0
0

Thanks for writing - I skimmed so may have missed things, but I think these arguments have significant weaknesses, e.g.:

  • They draw a strong conclusion about major historical patterns just based on guesswork about ~12 examples (including 3 that are explicitly taken from the author's imagination).
  • They do not consider examples which suggest long-term thinking has been very beneficial.
    • E.g. some sources suggest that Lincoln had long-term motivations for permanently abolishing slavery, saying, "The abolition of slavery by constitutional provision settles the
... (read more)
4
BrownHairedEevee
2y
Another example of long-term thinking working well is Ben Franklin's bequests to the cities of Boston and Philadelphia, which grew for 200 years before being cashed out. (Also one of the inspirations for the Patient Philanthropy Fund.)
1
Brian Lui
2y
I think the slavery example is a strong example of longtermism having good outcomes, and it probably increased the amount of urgency to reduce slavery. My base rate for "this time it's different" arguments are low, except for ones that focus on extinction risk. Like if you mess up and everyone dies, that's unrecoverable. But for other things I am skeptical.
3
Guy Raveh
2y
To your Lincoln example I'd add good governance attempts in general - the US constitution appears to have been written with the express aim of providing long term democratic and stable government.

Thanks for posting! Tentative idea for tweaks: my intuition would be to modify the middle two branches into the following:

  • Long-term AI misuse
    • Stable authoritarianism
    • Value erosion from competition
    • "Lame" future
  • AI exacerbates other x-risk factors
    • (Great power / nuclear) conflict
    • Degraded epistemics
    • Other dangerous tech

Rationale:

  • "Other (AI-enabled) dangerous tech" feels to me like it clearly falls under "exacerbating other x-risk factors"
  • "AI-enabled dystopia" sounds broad enough to cover ~everything on the chart; "long-term AI misuse" might mor
... (read more)
3
Sam Clarke
2y
Thanks, I agree with most of these suggestions. I was trying to stipulate that the dangerous tech was a source of x-risk in itself, not just a risk factor (admittedly the boundary is fuzzy). The wording was "AI leads to deployment of technology that causes extinction or unrecoverable collapse" and the examples (which could have been clearer) were intended to be "a pathogen kills everyone" or "full scale nuclear war leads to unrecoverable collapse"
Mau
2y46
0
0

I'd consider tweaking (3) to something like, "Make sure you don't start a nuclear war based on a false alarm." The current version has imo some serious downsides:

  • Association with planned disloyalty could hurt the efforts of everyone else in this community who's trying to contribute to policy through dedicated service.
  • A (US) community's reputation for planned non-retaliation might increase risk of nuclear war, because nuclear peace depends partly on countries' perceptions that other countries are committed to retaliating against nuclear attack.
2
Linch
2y
(I strongly upvoted this; I think some subset of EAs are overly naive about what channels could nuclear policy be used to reduce catastrophic risks)
-14
Midtermist12
2y

I don't have a great sense of how long what you're describing would take, but here is a collection of relevant resources.

3
Jordan Arel
2y
Thanks! Already taking this course
  • Is the idea that there isn't already opposition to EA stances, so creating it is extra bad?
  • On the flip side of one point, obscurity of these cause areas also reduces cause-area-motivated opponents.
1
iamasockpuppet
2y
Bringing into existence new opposition is bad. Not sure to what extent there's currently no opposition; but there's no opposition of the sort that EA's would face. (I'm pretty sure there are no professional oppo researchers targeting EA or individual EA's right now, for example. Similarly, existing politicians have no reason to dislike EA. Similarly, I'm pretty sure that there's never been a publicly running advertisement attacking effective altruism.) (I think it's pretty likely that there attack ads would be if EA keeps running candidates. Maybe you're skeptical now, but that's based on a positive view of EA from the inside; not based on what a motivated opposition researcher who's being paid to find reasons to criticize an EA would either find or twist to criticize. The term "political hatchet job" exists for a reason.) I'm not really sure what you mean by that? Is the idea that nobody would oppose EA candidates because of EA ideology? That's true, they would oppose EA candidates for non-EA ideological reasons, and also because electoral seats are a scarce resource.
Mau
2y10
0
0

I agree with a lot of this, although I'm not sure I see why standardized cost benefit analysis would be necessary for legitimate epistemic progress to be made? There are many empirical questions that seem important from a wide range of ethical views, and people with shared interest in these questions can work together to figure these out, while drawing their own normative conclusions. (This seems to line up with what most organizations affiliated with this community actually do--my impression is that lots more research goes into empirical questions than in... (read more)

3
Charlie_Guthmann
2y
I think I overstated my case somewhat or used the wrong wording. I don’t think standardized cbas are completely necessary for epistemic progress. In fact as long as the cba is done with outputs per dollar rather than outcomes per dollar or includes the former in the analysis it shouldn’t be much of a problem because as you said people can overlay their normative concerns. I do think that most posts here aren’t prefaced with normative frameworks, and this is sometimes completely unimportant(in the case of empirical stuff), or in other cases more important(how do we approach funding research, how should we act as a community and individuals as a part of the community). I think a big part of the reason that it isn’t more confusing is that as the other commenter said, almost everyone here is a utilitarian. I agree that there is a reason to have the ea umbrella outside of epistemic reasons. So again I used overly strongly wording or was maybe just plainly incorrect. A lot of what was going on in my head with respect to cost benefit analyses when I wrote this comment was about grantmaking. For instance, If a grantmaker says it’s funding based on projects that will help the long term of humanity, I feel like that leaves a lot on the table. Do you care about pain or pleasure? Humans or everyone? Inevitably they will use some sort of rubric. If they haven’t thought through what normative considerations the rubric is based on, the rubric may be somewhat incoherent to any specific value system or even worse completely aligned with a specific one by accident. I could imagine this creating non Bayesian value drift, since while research cbas allow us to overlay our own normative frameworks, grants are real world decisions. I can’t overlay my own framework over someone else’s decision to give a grant. Also I do feel a bit bad about my original comment because I meant the comment to really just be a jumping off point for other anti-realists to express confusion about how to ta

I think you have good points around partisanship, comparative advantage, and weaknesses of some arguments on the forum. Two other thoughts:

  • Your analysis of the limited influence of individual elected officials focuses on the House of Representatives. These arguments are grounded in the House's unique features (e.g., influence of leadership, limits on members' ability to propose amendments), so it doesn't make sense to generalize their conclusions to the Senate.
  • Concluding one of your sections, you write, "they inevitably will end up campaigning on, and d
... (read more)
3
iamasockpuppet
2y
Yeah. It's kinda difficult for me to present a knockdown argument against explicitly EA politicians, when everybody agrees that the benefits would be illegible the public and difficult to predict. This is true, and I think my essay suffered somewhat from being both an argument specifically about Carrick Flynn and an argument about politics more generally. I think that the focus on the House, beyond just Flynn, was somewhat reasonable though; I think that EA is just incapable of getting an EA elected to the Senate in the near term, so there's not much point in considering the benefits. I can elaborate on this if you disagree. Two reasons this is bad: * As I discussed in the next section, creating opposition to EA could be harmful to EA itself. * It means that most of the EA effort in elections would go, not to arguing for EA causes specifically, but to arguing for and fighting over other issues. For example, in OR-06, significant sums of EA money was spent attacking Andrea Salinas as a drug lobbyist, which is not an EA focus. It also means that EA might lose elections even where nobody really disagrees or opposes the EA cause areas; OR-06 is again an example.
3
Guy Raveh
2y
Why would that be? Transparency and keeping everything important public would in my view be the solution to attacks on the movement by political opponents. Secrecy is what can make us be perceived as a sinister organisation.

I'm still figuring out how I want to engage on this forum; for now, I generally, tentatively prefer to not disclose personal information on here. I'd encourage readers to conservatively assume I have conflicts of interest, and to assess my comments and posts based on their merits. (My vague sense is that this is a common approach to this forum--common enough that non-disclosure doesn't imply an absence of conflicts of interest--but maybe I've misread? I'm not confident about the approach I'm taking - feel free to message me on this forum if you'd like to d... (read more)

I'm mostly sympathetic - I'd add a few caveats:

  • Research has to slow down enough for an AI developer to fall behind; an AI developer that has some lead over their competition would have some slack, potentially enabling safety-concerned people to contribute. (That doesn't necessarily mean companies should try to get a lead though.)
  • It seems plausible for some useful regulation to take the form of industry self-regulation (which safety-concerned people at these companies could help advance).
5
Ofer
2y
Generally, I think self-regulation is usually promoted by industry actors in order to prevent actual regulation. Based on your username and a bit of internet research, you seem to be an AI Governance Research Contractor at a major AGI company. Is this correct? If so, I suggest that you disclose that affiliation on your profile bio (considering that you engage in the topic of AI regulation on this forum). (To be clear, your comments here seem consistent with you acting in good faith and having the best intentions.)
Load more