All of Zach Stein-Perlman's Comments + Replies

Two hours before you posted this, MacAskill posted a brief explanation of viatopianism.

This essay is the first in a series that discusses what a good north star [for post-superintelligence society] might be. I begin by describing a concept that I find helpful in this regard:

Viatopia: an intermediate state of society that is on track for a near-best future, whatever that might look like.

Viatopia is a waystation rather than a final destination; etymologically, it means “by way of this place”. We can often describe good waystations even if we have little idea

... (read more)
1
JordanStone
Yeah, agreed on that point. Folks at Forethought aren't necessarily thinking about what a near-optimal future should look like, they're thinking about how to get civilisation to a point where we can make the best possible decisions about what to do with the long-term future. 
3
peterbarnett
I should read that piece. In general, I am very into the Long Reflection and I guess also the Viatopia stuff.

Bores, Wiener, and other AI safety in US politics stuff. $129K total, >100% my income.

0
Simon Newstead 🔸
Thanks for digging deep! 👏

Quick take on longtermist donations for giving tuesday.

My favorite donation opportunity is Alex Bores's congressional campaign. I also like Scott Wiener's congressional campaign.

If you have to donate to a normal longtermist 501c3, I think Forethought, METR, and The Midas Project—and LTFF/ARM and Longview's Frontier AI Fund—are good and can use more money (and can't take Good Ventures money). But I focus on evaluating stuff other than normal longtermist c3s, because other stuff seems better and has been investigated much less; I don't feel very strongly abo... (read more)

3
Charlie_Guthmann
Curious why METR. This is less METR specific and more about capabilities benchmarks: Doesn't frontier capabilities benchmarking help accelerate ai development? I know they also do pure safety but in practice I feel like they have done more to push forth the agentic autmation race. 
4
Yarrow Bouchard 🔸
Why can’t they take Good Ventures’ money? Also, are these all longtermist organizations, or AGI near-tearmist organizations?

As one of Zach's collaborators, I endorse these recommendations. If I had to choose among the 501c3s listed above, I'd choose Forethought first and the Midas Project second, but these are quite weakly held opinions.

I do recommend reaching out about nonpublic recommendations if you're likely to give over $20k!

  1. +1
  2. Random take: people underrate optionality / information value. Even within EA, few opportunities are within 5x of the best opportunities (even on the margin), due to inefficiencies in the process by which people get informed about donation opportunities. Waiting to donate is great if it increases your chances of donating very well. Almost all of my friends regret their past donations; they wish they'd saved money until they were better-informed.
  3. Random take: there are still some great c3 opportunities, but hopefully after the Anthropic people eventually g
... (read more)
2
Yarrow Bouchard 🔸
Are you abbreviating 501(c)(3) to c3?

I'm not sure what we should be doing now! But I expect that people can make progress if they backchain from the von Neumann probes, whereas my impression is that most people entering the "digital sentience" space never think about the von Neumann probes.

Oh, clarification: it's very possible that there aren't great grant opportunities by my lights. It's not like I'm aware of great opportunities that the other Zach isn't funding. I should have focused more on expected grants than Zach's process.

Thanks. I'm somewhat glad to hear this.

One crux is that I'm worried that broad field-building mostly recruits people to work on stuff like "are AIs conscious" and "how can we improve short-term AI welfare" rather than "how can we do digital-minds stuff to improve what the von Neumann probes tile the universe with." So the field-building feels approximately zero-value to me — I doubt you'll be able to steer people toward the important stuff in the future.

A smaller crux is that I'm worried about lab-facing work similarly being poorly aimed.

4
Derek Shiller
I find this distinction kind of odd. If we care about what digital minds we produce in the future, what should we be doing now? I expect that what minds we build in large numbers in the future will be largely depend on how we answer a political question. The best way to prepare now for influencing how we as a society answer that question (in a positive way) is to build up a community with a reputation for good research, figure out the most important cruxes and what we should say about them, create a better understanding of what we should actually be aiming for, initiate valuable relationships with potential stakeholders based on mutual respect and trust, creating basic norms about human-ai relationships, and so on. To me, that looks like engaging with whether near-future AIs are conscious (or have other morally important traits) and working with stakeholders to figure out what policies make sense at what times. Though I would have thought the posts you highlighted as work you're more optimistic about fit squarely within that project, so maybe I'm misunderstanding you.
6
Zach Stein-Perlman
Oh, clarification: it's very possible that there aren't great grant opportunities by my lights. It's not like I'm aware of great opportunities that the other Zach isn't funding. I should have focused more on expected grants than Zach's process.

I endorse Longview's Frontier AI Fund; I think it'll give to high-marginal-EV AI safety c3s.

I do not endorse Longview's Digital Sentience Fund. (This view is weakly held. I haven't really engaged.) I expect it'll fund misc empirical and philosophical "digital sentience" work plus unfocused field-building — not backchaining from averting AI takeover or making the long-term future go well conditional on no AI takeover. I feel only barely positive about that. (I feel excited about theoretical work like this.)

I'm a grantmaker at Longview and manage the Digital Sentience Fund—thought I'd share my thinking here: “backchaining from… making the long-term future go well conditional on no Al takeover” is my goal with the fund (with the restriction of being related to the wellbeing of AIs in a somewhat direct way), though we might disagree on how that’s best achieved through funding. Specifically, the things you’re excited about would probably be toward the top of the list of things I’m excited about, but I also think broader empirical and philosophical work and field... (read more)

cb
14
3
0
2

I pulled the 500M figure from the job posting, and it includes grants we expect to make before the end of the year— I think it’s a more accurate estimate of our spending. Also, like this page says, we don’t publish all our grants (and when we do publish, there’s a delay between making the grant and publishing the page, so the website is a little behind).

I have a decent understanding of some of the space. I feel good about marginal c4 money for AIPN and SAIP. (I believe AIPN now has funding for most of 2026, but I still feel good about marginal funding.)

There are opportunities to donate to politicians and PACs which seem 5x as impactful as the best c4s. These are (1) more complicated and (2) public. If you're interested in donating ≥$20K to these, DM me. This is only for US permanent residents.

I'm confident the timing was a coincidence. I agree that (novel, thoughtful, careful) posting can make things happen.

5
Charlie G 🔹
I agree that the timing is to some extent a coincidence, especially considering that the TIME piece followed an Anthropic board appointment which would have to have been months in the making, but I'm also fairly confident that your piece shaped at least part of the TIME article. As far as I can tell, you were the first person to bring up the concern that large shareholders, in particular potentially Amazon and Google, could end up overruling the LTBT and annulling it. The TIME piece quite directly addressed that concern, saying, To me, it would be surprising if this section was added without your post in mind. Again, your post is the only time prior to the publication of this article (AFAICT) that this concern was raised.

I mostly agree with the core claim. Here's how I'd put related points:

  • Impact is related to productivity, not doing-your-best.
  • Praiseworthiness is related to doing-your-best, not productivity.
  • But doing-your-best involves maximizing productivity.
  • Increasing hours-worked doesn't necessarily increase long-run productivity. (But it's somewhat suspiciously convenient to claim that it doesn't, and for many people it would.)
7
lynettebye
I agree with all those points. One additional point I was trying to make is that it's okay to trade off some impact for your own happiness.  I think the EA community would benefit on the margin from moving more in the personal happiness direction. (Probably the general population would benefit from moving more in the impact direction, but my audience is heavily EA-skewed.) 

I haven't read all of the relevant stuff in a long time but my impression is Bio/Chem High is about uplifiting novices and Critical is about uplifting experts. See PF below. Also note OpenAI said Deep Research was safe; it's ChatGPT Agent and GPT-5 which it said required safeguards.

3
aog
That’s the new PF. The old (December 2023) version defined a medium risk threshold which Deep Research surpassed.  https://cdn.openai.com/openai-preparedness-framework-beta.pdf

I haven't really thought about it and I'm not going to. If I wanted to be more precise, I'd assume that a $20 subscription is equivalent (to a company) to finding a $20 bill on the ground, assume that an ε% increase in spending on safety cancels out an ε% increase in spending on capabilities (or think about it and pick a different ratio), and look at money currently spent on safety vs capabilities. I don't think P(doom) or company-evilness is a big crux.

fwiw I think you shouldn't worry about paying $20/month to an evil company to improve your productivity, and if you want to offset it I think a $10/year donation to LTFF would more than suffice.

3
harfe
Can you say more on why you think a 1:24 ratio is the right one (as opposed to lower or higher ratios)? And how might this ratio differ for people who have different beliefs than you, for example about xrisk, LTFF, or the evilness of these companies?

The thresholds are pretty meaningless without at least a high-level standard, no?

One problem is that donors would rather support their favorite research than a mixture that includes non-favorite research.

1
tylermjohn
Most major donors don't have time or expertise to vet research opportunities, so they'd rather outsource to someone else who can source and vet them.  

I'm optimistic about the very best value-increasing research/interventions. But in terms of what would actually be done at the margin, most work that people would do for "value-increasing" reasons would be confused/doomed, I expect (and this is less true for AI safety).

I think for many people, positive comments would be much less meaningful if they were rewarded/quantified, because you would doubt that they're genuine. (Especially if you excessively feel like an imposter and easily seize onto reasons to dismiss praise.)

I disagree with your recommendations despite agreeing that positive comments are undersupplied.

4
Ozzie Gooen
I'd quickly flag: 1. Any decent intervention should be done experimentally. It's not like there would be "one system, hastily put-together, in place forever." More like, early work would try out some things and see what the response is like in practice. I imagine that many original ideas would be mediocre, but with the right modifications and adjustments to feedback, it's possible to make something decent.  2. I think that positive comments are often already rewarded - and that's a major reason people give them. But I don't think this is necessarily a bad thing. My quick guess is that this is a situation of adjusting incentives - certain incentive structures would encourage certain classes of good and bad behaviors, so it's important to continue to tune these. Right now we have some basic incentives that were arrived at by default, and in my opinion are quite unsophisticated (people are incentivized to be extra nice to people who are powerful and who will respond, and mean to people in the outgroup). I think semi-intentional work can improve this, but I realize it would need to be done well.  On my side it feels a bit like, "We currently have an ecosystem of very mediocre incentives, that produce the current results. It's possible to set up infrastructure to adjust those incentives and experiment with what those results would be. I'm optimistic that this problem is both important enough and tractable enough for some good efforts to work on."

Given 3, a key question is what can we do to increase P(optimonium | ¬ AI doom)?

For example:

  • Averting AI-enabled human-power-grabs might increase P(optimonium | ¬ AI doom)
  • Averting premature lock-in and ensuring the von Neumann probes are launched deliberately would increase P(optimonium | ¬ AI doom), but what can we do about that?
  • Some people seem to think that having norms of being nice to LLMs is valuable for increasing P(optimonium | ¬ AI doom), but I'm skeptical and I haven't seen this written up.

(More precisely we should talk about expected fraction of ... (read more)

One key question for the debate is: what can we do / what are the best ways to "increas[e] the value of futures where we survive"?

My guess is it's better to spend most effort on identifying possible best ways to "increas[e] the value of futures where we survive" and arguing about how valuable they are, rather than arguing about "reducing the chance of our extinction [vs] increasing the value of futures where we survive" in the abstract.

2
Toby Tremlett🔹
I agree- this is what I mean by my clarification of the tractability point above. One of the biggest considerations for me personally in this debate is whether there are any interventions in the 'increasing the value of the future' field which are as robust in their value as extinction risk reduction. 

I want to make salient these propositions, which I consider very likely:

  1. In expectation, almost all of the resources our successors will use/affect comes via von Neumann probes (or maybe acausal trade or affecting the simulators).
  2. If 1, the key question for evaluating a possible future from scope-sensitive perspectives is will the von Neumann probes be launched, and what is it that they will tile the universe with? (modulo acausal trade and simulation stuff)
  3. [controversial] The best possible thing to tile the universe with (maybe call it "optimonium") is wild
... (read more)
4
Zach Stein-Perlman
Given 3, a key question is what can we do to increase P(optimonium | ¬ AI doom)? For example: * Averting AI-enabled human-power-grabs might increase P(optimonium | ¬ AI doom) * Averting premature lock-in and ensuring the von Neumann probes are launched deliberately would increase P(optimonium | ¬ AI doom), but what can we do about that? * Some people seem to think that having norms of being nice to LLMs is valuable for increasing P(optimonium | ¬ AI doom), but I'm skeptical and I haven't seen this written up. (More precisely we should talk about expected fraction of resources that are optimonium rather than probability of optimonium but probability might be a fine approximation.)

Agree. Nice (truth-tracking) comments seem high-leverage for boosting morale + reducing excessive aversion to forum-posting + countering the phenomenon where commenters are more critical than the average reader (which warps what authors think about their readers).

This is circular. The principle is only compromised if (OP believes) the change decreases EV — but obviously OP doesn't believe that; OP is acting in accordance with the do-what-you-believe-maximizes-EV-after-accounting-for-second-order-effects principle.

Maybe you think people should put zero weight on avoiding looking weird/slimy (beyond what you actually are) to low-context observers (e.g. college students learning about the EA club). You haven't argued that here. (And if that's true then OP made a normal mistake; it's not compromising principles.)

Just flagging this for context of readers, I think Habryka's position/reading makes more sense if you view it in the context of an ongoing Cold War between Good Ventures and Lightcone.[1]

Some evidence on the GV side:

... (read more)

I agree that things tend to get tricky and loopy around these kinds of reputation-considerations, but I think at least the approach I see you arguing for here is proving too much, and has a risk of collapsing into meaninglessness.

I think in the limit, if you treat all speech acts this way, you just end up having no grounding for communication. "Yes, it might be the case that the real principles of EA are X, but if I tell you instead they are X', then you will take better actions, so I am just going to claim they are X', as long as both X and X' include cos... (read more)

My impression is that CLTR mostly adds value via its private AI policy work. I agree its AI publications seem not super impressive but maybe that's OK.

Probably same for The Future Society and some others.

My top candidates:

  1. AI Safety and Governance Fund
  2. PauseAI US
  3. Center for AI Policy
  4. Palisade
  5. MIRI

A classification of every other org I reviewed:

Good but not funding-constrained: Center for AI Safety, Future of Life Institute

Would fund if I had more money: Control AI, Existential Risk Observatory, Lightcone Infrastructure, PauseAI Global, Sentinel

Would fund if I had a lot more money, but might fund orgs in other cause areas first: AI Policy Institute, CEEALAR, Center for Human-Compatible AI, Manifund

Might fund if I had a lot more money: AI Standards Lab, Centre for

... (read more)

Here's my longtermist, AI focused list. I really haven't done my research, e.g. I read zero marginal funding posts. MATS is probably the most popular of these, so this is basically a vote for MATS.

I would have ranked The Midas Project around 5 but it wasn't an option.

2
Toby Tremlett🔹
Hey Zach- Midas Project should be visible to you now!

"Improve US AI policy 5 percentage points" was defined as

Instead of buying think tanks, this option lets you improve AI policy directly. The distribution of possible US AI policies will go from being centered on the 50th-percentile-good outcome to being centered on the 55th-percentile-good outcome, as per your personal definition of good outcomes. The variance will stay the same.

(This is still poorly defined.)

Hmm, yeah, that is better-defined. I don't have a huge amount of variance within those percentiles, so I think I would probably take the 32.5 video explainers, but I really haven't thought much about it.

A few DC and EU people tell me that in private, Anthropic (and others) are more unequivocally antiregulation than their public statements would suggest.

I've tried to get this on the record—person X says that Anthropic said Y at meeting Z, or just Y and Z—but my sources have declined.

I've heard similar things, as well as Anthropic throwing their weight as a "safety" company to try to unduly influence other safety-concerned actors. 

I believe that Anthropic's policy advocacy is (1) bad and (2) worse in private than in public.

But Dario and Jack Clark do publicly oppose strong regulation. See https://ailabwatch.org/resources/company-advocacy/#dario-on-in-good-company-podcast and https://ailabwatch.org/resources/company-advocacy/#jack-clark. So this letter isn't surprising or a new betrayal — the issue is the preexisting antiregulation position, insofar as it's unreasonable.

2
Raemon
Can you say a bit more about: ?

Thanks.

I notice they have few publications.

Setting aside whether Neil's work is useful, presumably almost all of the grant is for his lab. I failed to find info on his lab.

5
peterbarnett
I would guess grants made to Neil's lab are referring to the MIT FutureTech group, which he's the director of. FutureTech says on its website that it has received grants from OpenPhil and the OpenPhil website doesn't seem to mention a grant to FutureTech anywhere, so I assume the OpenPhil FutureTech grant was the grant made to Neil's lab. 
5
defun 🔸
It's MIT FutureTech: https://futuretech.mit.edu/

...huh, I usually disagree with posts like this, but I'm quite surprised by the 2022 and 2023 grants.

1
defun 🔸
Agree. OP's hits-based giving approach might justify the 2020 grant, but not the 2022 and 2023 grants.

Actually, this is a poor description of my reaction to this post. Oops. I should have said:

Digital mind takeoff is maybe-plausibly crucial to how the long-term future goes. But this post seems to focus on short-term stuff such that the considerations it discusses miss the point (according to my normative and empirical beliefs). Like, the y-axis in the graphs is what matters short-term (and it's at most weakly associated with what matters long-term: affecting the von Neumann probe values or similar). And the post is just generally concerned with short-term ... (read more)

3
Bradford Saad
I'd be excited for AI welfare as an area to include a significant amount of explicitly longtermist work. Also, I find it plausible that heuristics like the one you mention will connect a lot of AI welfare work to the cosmic endowment. But I'm not convinced that it'd generally be a good idea to explicitly apply such heuristics in AI welfare work even for people who (unlike me) are fully convinced of longtermism. I expect a lot of work in this area to be valuable as building blocks that can be picked up from a variety of (longtermist and neartermist perspectives) and for such work's value to often not be enhanced by the work explicitly discussing how to build with it from different perspectives or how the authors would build on it given their perspective. I also worry that if AI welfare work were generally framed in longtermist terms whenever applicable (even when robust to longtermism vs. neartermism), that could severely limit the impact of the area.

The considerations in this post (and most "AI welfare" posts) are not directly important to digital mind value stuff, I think, if digital mind value stuff is dominated by possible superbeneficiaries created by von Neumann probes in the long-term future. (Note: this is a mix of normative and empirical claims.)

8
Zach Stein-Perlman
Actually, this is a poor description of my reaction to this post. Oops. I should have said: Digital mind takeoff is maybe-plausibly crucial to how the long-term future goes. But this post seems to focus on short-term stuff such that the considerations it discusses miss the point (according to my normative and empirical beliefs). Like, the y-axis in the graphs is what matters short-term (and it's at most weakly associated with what matters long-term: affecting the von Neumann probe values or similar). And the post is just generally concerned with short-term stuff, e.g. being particularly concerned about "High Maximum Altitude Scenarios": aggregate welfare capacity "at least that of 100 billion humans" "within 50 years of launch." Even ignoring these particular numbers, the post is ultimately concerned with stuff that's a rounding error relative to the cosmic endowment. I'm much more excited about "AI welfare" work that's about what happens with the cosmic endowment, or at least (1) about stuff directly relevant to that (like the long reflection) or (2) connected to it via explicit heuristics like the cosmic endowment will be used better in expectation if "AI welfare" is more salient when we're reflecting or choosing values or whatever.
1
Bradford Saad
If digital minds takeoff goes well (rather than badly) for digital minds and with respect to existential risk, would we expect a better far-future for digital minds? If so, then I'm inclined to think some considerations in the post are at least indirectly important to digital mind value stuff. If not, then I'm inclined to think digital mind value stuff we have a clue about how to positively affect is not in the far future.

(Minor point: in an unstable multipolar world, it's not clear how things get locked in, and for the von Neumann probes in particular, note that if you can launch slightly faster probes a few years later, you can beat rushed-out probes.)

2
Will Aldred
Yeah, I agree that it’s unclear how things get locked in in this scenario. However, my best guess is that solving the technological problem of designing and building probes that travel as fast as allowed by physics—i.e., just shy of light speed[1]—takes less time than solving the philosophical problem of what to do with the cosmos. If one is in a race, then one is forced into launching probes as soon as one has solved the technological problem of fast-as-physically-possible probes (because delaying means losing the race),[2] and so in my best guess the probes launched will be loaded with values that one likely wouldn’t endorse if one had more time to reflect.[3] Additionally, if one is in a race to build fast-as-physically-possible probes, then one is presumably putting most of one’s compute toward winning that race, leaving one with little compute for solving the problem of what values to load the probes with.[4] Overall, I feel pretty pessimistic about a multipolar scenario going well,[5] but I’m not confident. 1. ^ assuming that new physics permitting faster-than-light travel is ruled out (or otherwise not discovered) 2. ^ There’s some nuance here: maybe one has a lead and can afford some delay. Also, the prize is continuous rather than discrete—that is, one still gets some of the cosmos if one launches late (although on account of how the probes reproduce exponentially, one does lose out big time by being second)*. *From Carl Shulman’s recent 80k interview: 3. ^ It’s very unclear to me how large this discrepancy is likely to be. Are the loaded values totally wrong according to one’s idealized self? Or are they basically right, such that the future is almost ideal? 4. ^ There’s again some nuance here, like maybe one believes that the set of world-states/matter-configurations that would score well according to one’s idealized values is very narrow. In this case, the EV calculation could indicate that it’s bet

When telling stories like your first paragraph, I wish people either said "almost all of the galaxies we reach are tiled with some flavor of computronium and here's how AI welfare work affected the flavor" or "it is not the case that almost all of the galaxies we reach are tiled with some flavor of computronium and here's why."

The universe will very likely be tiled with some flavor of computronium is a crucial consideration, I think.

3
trammell
To my mind, the first point applies to whatever resources are used throughout the future, whether it’s just the earth or some larger part of the universe. I agree that the number/importance of welfare subjects in the future is a crucial consideration for how much to do longtermist as opposed to other work. But when comparing longtermist interventions—say, splitting a budget between lowering the risk of the world ending and proportionally increasing the fraction of resources devoted to creating happy artificial minds—it would seem to me that the “size of the future” typically multiplies the value of both interventions equally, and so doesn’t matter.

Briefly + roughly (not precise):

At some point we'll send out lightspeed probes to tile the universe with some flavor of computronium. The key question (for scope-sensitive altruists) is what that computronium will compute. Will an unwise agent or incoherent egregore answer that question thoughtlessly? I intuit no.

I can't easily make this intuition legible. (So I likely won't reply to messages about this.)

I agree this is possible, and I think a decent fraction of the value of "AI welfare" work comes from stuff like this.

Those humans decide to dictate some or all of what the future looks like, and lots of AIs end up suffering in this future because their welfare isn't considered by the decision makers.

This would be very weird: it requires that either the value-setters are very rushed or that they have lots of time to consult with superintelligent advisors but still make the wrong choice. Both paths seem unlikely.

9
Will Aldred
As an intuition pump: if the Trump administration,[1] or a coalition of governments led by the U.S., is faced all of a sudden—on account of intelligence explosion[2] plus alignment going well—with deciding what to do with the cosmos, will they proceed thoughtfully or kind of in a rush? I very much hope the answer is “thoughtfully,” but I would not bet[3] that way. What about if we end up in a multipolar scenario, as forecasters think is about 50% likely? In this case, I think rushing is the default? Pausing for a long reflection may be the obvious path to you or me or EAs in general if suddenly in charge of an aligned ASI singleton, but the way we think is very strange compared to most people in the world.[4] I expect that without a good deal of nudging/convincing, the folks calling the shots will not opt for such reflection.[5] (Note that I don’t consider this a knockdown argument for putting resources towards AI welfare in particular: I only voted slightly in the direction of “agree” for this debate week. I do, however, think that many more EA resources should be going towards ASI governance / setting up a long reflection, as I have written before.) One thread here that feels relevant: I don’t think it’s at all obvious that superintelligent advisors will be philosophically competent.[6] Wei Dai has written a series of posts on this topic (which I collected here); this is an open area of inquiry that serious thinkers in our sphere are funding. In my model, this thread links up with AI welfare since welfare is in part an empirical problem, which superintelligent advisors will be great at helping with, but also in part a problem of values and philosophy.[7] 1. ^ the likely U.S. presidential administration for the next four years 2. ^ in this world, TAI has been nationalized 3. ^ I apologize to Nuño, who will receive an alert, for not using “bet” in the strictly correct way. 4. ^ All recent U.S. presidents have been reli
1
Isaac Dunn
Interesting! I think my worry is people who don't think they need advice about what the future should look like. When I imagine them making the bad decision despite having lots of time to consult superintelligent AIs, I imagine them just not being that interested in making the "right" decision? And therefore their advisors not being proactive in telling them things that are only relevant for making the "right" decision. That is, assuming the AIs are intent aligned, they'll only help you in the ways you want to be helped: * Thoughtful people might realise the importance of getting the decision right, and might ask "please help me to get this decision right" in a way that ends up with the advisors pointing out that AI welfare matters and the decision makers will want to take that into account. * But unthoughtful or hubristic people might not ask for help in that way. They might just ask for help in implementing their existing ideas, and not be interested in making the "right" decision or in what they would endorse on reflection. I do hope that people won't be so thoughtless as to impose their vision of the future without seeking advice, but I'm not confident.

Among your friends, I agree; among EA Forum users, I disagree.

1
Ryan Greenblatt
Yes, I meant central to me personally, edited the comment to clarify.

Caveats:

  1. I endorse the argument we should figure out how to use LLM-based systems without accidentally torturing them because they're more likely to take catastrophic actions if we're torturing them.
  2. I haven't tried to understand the argument we should try to pay AIs to [not betray us / tell on traitors / etc.] and working on AI-welfare stuff would help us offer AIs payment better; there might be something there.
  3. I don't understand the decision theory mumble mumble argument; there might be something there.

(Other than that, it seems hard to tell a story about ... (read more)

5
Ryan Greenblatt
FWIW, these motivations seem reasonably central to me personally, though not my only motivations.

My position on "AI welfare"

  1. If we achieve existential security and launch the von Neumann probes successfully, we will be able to do >>10^80 operations in expectation. We could tile the universe with hedonium or do acausal trade or something and it's worth >>10^60 happy human lives in expectation. Digital minds are super important.
  2. Short-term AI suffering will be small-scale—less than 10^40 FLOP and far from optimized for suffering, even if suffering is incidental—and worth <<10^20 happy human lives (very likely <10^10).
  3. 10^20 isn't even
... (read more)

Why does "lock-in" seem so unlikely to you?

One story:

  • Assume AI welfare matters
  • Aligned AI concentrates power in a small group of humans
  • AI technology allows them to dictate aspects of the future / cause some "lock in" if they want. That's because:
    • These humans control the AI systems that have all the hard power in the world
    • Those AI systems will retain all the hard power indefinitely; their wishes cannot be subverted
    • Those AI systems will continue to obey whatever instructions they are given indefinitely
  • Those humans decide to dictate some or all of what the fut
... (read more)

I basically agree with this with some caveats. (Despite writing a post discussing AI welfare interventions.)

I discuss related topics here and what fraction of resources should go to AI welfare. (A section in the same post I link above.)

The main caveats to my agreement are:

  • From a deontology-style perspective, I think there is a pretty good case for trying to do something reasonable on AI welfare. Minimally, we should try to make sure that AIs consent to their current overall situation insofar as they are capable of consenting. I don't put a huge amount of w
... (read more)
4
Zach Stein-Perlman
Caveats: 1. I endorse the argument we should figure out how to use LLM-based systems without accidentally torturing them because they're more likely to take catastrophic actions if we're torturing them. 2. I haven't tried to understand the argument we should try to pay AIs to [not betray us / tell on traitors / etc.] and working on AI-welfare stuff would help us offer AIs payment better; there might be something there. 3. I don't understand the decision theory mumble mumble argument; there might be something there. (Other than that, it seems hard to tell a story about how "AI welfare" research/interventions now could substantially improve the value of the long-term future.) (My impression is these arguments are important to very few AI-welfare-prioritizers / most AI-welfare-prioritizers have the wrong reasons.)
2
Ozzie Gooen
This is very similar to my current stance. 

I am surprised that you don't understand Eliezer's comments in this thread. I claim you'd do better to donate $X to PauseAI now than lock up $2X which you will never see again (plus lock up more for overcollateralization) in order to get $X to PauseAI now.

Did you follow the thread(s) all the way to the end? I do see the overcollateralized part of the lock-up again. And I'm planning on holding ~50% long term (i.e. past 5 years) because I'm on ~50% that we make it past then.

Alas, Eliezer did not answer my question at the end.

For anyone who wants to bet on doom:

  • I claim it can’t possibly be good for you
    • Unless you plan to spend all of your money before you would owe money back
      • People seem to think what matters is ∫bankroll when what actually matters is ∫consumption?
    • Or unless you're betting on high rates of returns to capital, not really on doom
  • Good news: you can probably borrow cheaply. E.g. if you have $2X in investments, you can sell them, invest $X at 2x leverage, and effectively borrow the other $X.
8
Greg_Colbourn ⏸️
This would not be good for you unless you were an immoral sociopath with no concern for the social opprobrium that results from not honouring the bet.  There is some element of this for me (I hope to more than 2x my capital in worlds where we survive). But it's not the main reason. The main reason it's good for me is that it helps reduce the likelihood of doom. That is my main goal for the next few years. If the interest this is getting gets even one more person to take near-term AI doom as seriously as I do, then that's a win. Also the $x to PauseAI now is worth >>$2x to PauseAI in 2028. This is not without risk (of being margin called in a 50% drawdown)[1]. Else why wouldn't people be doing this as standard? I've not really heard of anyone doing it. 1. ^ And it could also be costly in borrowing fees for the leverage.

Greg made a bad bet. He could do strictly better, by his lights, by borrowing 10K, giving it to PauseAI, and paying back ~15K (10K + high interest) in 4 years. (Or he could just donate 10K to PauseAI. If he's unable to do this, Vasco should worry about Greg's liquidity in 4 years.) Or he could have gotten a better deal by betting with someone else; if there was a market for this bet, I claim the market price would be substantially more favorable to Greg than paying back 200% (plus inflation) over <4 years.

[Edit: the market for this bet is, like, the market for 4-year personal loans.]

7
NickLaing
Want to chip in that the interest for this post and huge comment thread lends evidence towards the bet being good for Greg's reason below, even if he could have done far better on the odds/financial front. "Also, the signalling value of the wager is pretty important too imo. I want people to put their money where their mouth is if they are so sure that AI x-risk isn't a near term problem. And I want to put my money where my mouth is too, to show how serious I am about this."

Thanks for the comment, Zach! Jason suggested something similar to Greg:

As much as I may appreciate a good wager, I would feel remiss not to ask if you could get a better result for amount of home equity at risk by getting a HELOC and having a bank be the counterparty? Maybe not at lower dollar amounts due to fixed costs/fees, but likely so nearer the $250K point [Greg was willing to bet up to this] -- especially with the expectation that interest rates will go down later in the year.

Greg replied the following on April 17:

I don't have a stable income so I

... (read more)
5
OscarD🔸
I would be interested in @Greg_Colbourn's thoughts here! Possibly part of the value is in generating discussion and publicly defending a radical idea, rather than just the monetary EV. But if so maybe a smaller bet would have made sense.

This isn't really against Zach’s point, but presumably, a lot of Greg's motivation here is signalling that he is serious about this belief and having a bunch of people in this community see it for advocacy reasons. I think that taking out a loan wouldn't do nearly as well on the advocacy front as making a public bet like this.

This isn’t true if Greg values animal welfare donations above most non-AI things by a sufficient amount. He could have tried to shop around for more favorable conditions with someone else in EA circles but it seems pretty likely that he’d end up going with this one. There’s no market for these bets.

Yes, I've previously made some folks at Anthropic aware of these concerns, e.g. associated with this post.

In response to this post, Zac Hatfield-Dodds told me he expects Anthropic will publish more information about its governance in the future.

I claim that public information is very consistent with the investors hold an axe over the Trust; maybe the Trust will cause the Board to be slightly better or the investors will abrogate the Trust or the Trustees will loudly resign at some point; regardless, the Trust is very subordinate to the investors and won't be able to do much.

And if so, I think it's reasonable to describe the Trust as "maybe powerless."

Maybe. Note that they sometimes brag about how independent the Trust is and how some investors dislike it, e.g. Dario:

Every traditional investor who invests in Anthropic looks at this. Some of them are just like, whatever, you run your company how you want. Some of them are like, oh my god, this body of random people could move Anthropic in a direction that's totally contrary to shareholder value.

And I've never heard someone from Anthropic suggest this.

It seems weird that none of the labs would have said that when asked for comment?

4
aog
Interesting. That seems possible, and if so, then the companies did not violate that agreement.  I've updated the first paragraph of the article to more clearly describe the evidence we have about these commitments. I'd love to see more information about exactly what happened here. 
Load more