My position on "AI welfare"
[Edit: the questionable part of this is #4.]
I basically agree with this with some caveats. (Despite writing a post discussing AI welfare interventions.)
I discuss related topics here and what fraction of resources should go to AI welfare. (A section in the same post I link above.)
The main caveats to my agreement are:
Why does "lock-in" seem so unlikely to you?
One story:
You could imagine AI welfare work now improving things by putting AI welfare on the radar of those people, so they're more likely to take AI welfare into account when making decisions.
I'd be interested in which step of this story seems implausible to you - is it about AI technology making "lock in" possible?
I agree this is possible, and I think a decent fraction of the value of "AI welfare" work comes from stuff like this.
Those humans decide to dictate some or all of what the future looks like, and lots of AIs end up suffering in this future because their welfare isn't considered by the decision makers.
This would be very weird: it requires that either the value-setters are very rushed or that they have lots of time to consult with superintelligent advisors but still make the wrong choice. Both paths seem unlikely.
This would be very weird: it requires that either the value-setters are very rushed or [...]
As an intuition pump: if the Trump administration,[1] or a coalition of governments led by the U.S., is faced all of a sudden—on account of intelligence explosion[2] plus alignment going well—with deciding what to do with the cosmos, will they proceed thoughtfully or kind of in a rush? I very much hope the answer is “thoughtfully,” but I would not bet[3] that way.
What about if we end up in a multipolar scenario, as forecasters think is about 50% likely? In this case, I think rushing is the default?
Pausing for a long reflection may be the obvious path to you or me or EAs in general if suddenly in charge of an aligned ASI singleton, but the way we think is very strange compared to most people in the world.[4] I expect that without a good deal of nudging/convincing, the folks calling the shots will not opt for such reflection.[5]
(Note that I don’t consider this a knockdown argument for putting resources towards AI welfare in particular: I only voted slightly in the direction of “agree” for this debate week. I do, however, think that many more EA resources should be going towards ASI governance / setting up a long reflection, as I have written before.)
This would be very weird: it requires that either the value-settlers [...] or that they have lots of time to consult with superintelligent advisors but still make the wrong choice.
One thread here that feels relevant: I don’t think it’s at all obvious that superintelligent advisors will be philosophically competent.[6] Wei Dai has written a series of posts on this topic (which I collected here); this is an open area of inquiry that serious thinkers in our sphere are funding. In my model, this thread links up with AI welfare since welfare is in part an empirical problem, which superintelligent advisors will be great at helping with, but also in part a problem of values and philosophy.[7]
the likely U.S. presidential administration for the next four years
in this world, TAI has been nationalized
I apologize to Nuño, who will receive an alert, for not using “bet” in the strictly correct way.
All recent U.S. presidents have been religious, for instance.
My mainline prediction is that decision makers will put some thought towards things like AI welfare—in fact, by normal standards they’ll put quite a lot of thought towards these things—but they will fall short of the extreme thoughtfulness that a scope-sensitive assessment of the stakes calls for. (This prediction is partly informed by someone I know who’s close to national security, and who has been testing the waters there to gauge the level of openness towards something like a long reflection.)
One might argue that this is a contradictory statement, since the most common definition of superintelligence is an AI system (or set of systems) that’s better than the best human experts in all domains. So, really, what I’m saying is that I believe it’s very possible we end up in a situation in which we think we have superintelligence—and the AI we have sure is superhuman at many/most/almost-all things—but, importantly, philosophy is its Achilles heel.
(To be clear, I don’t believe there’s anything special about biological human brains that makes us uniquely suited to philosophy; I don’t believe that philosophically competent AIs are precluded from the space of all possible AIs. Nonetheless, I do think there’s a substantial chance that the “aligned” “superintelligence” we build in practice lacks philosophical competence, to catastrophic effect. (For more, see Wei Dai’s posts.))
Relatedly, if illusionism is true, then welfare is a fully subjective problem.
(Minor point: in an unstable multipolar world, it's not clear how things get locked in, and for the von Neumann probes in particular, note that if you can launch slightly faster probes a few years later, you can beat rushed-out probes.)
Yeah, I agree that it’s unclear how things get locked in in this scenario. However, my best guess is that solving the technological problem of designing and building probes that travel as fast as allowed by physics—i.e., just shy of light speed[1]—takes less time than solving the philosophical problem of what to do with the cosmos.
If one is in a race, then one is forced into launching probes as soon as one has solved the technological problem of fast-as-physically-possible probes (because delaying means losing the race),[2] and so in my best guess the probes launched will be loaded with values that one likely wouldn’t endorse if one had more time to reflect.[3]
Additionally, if one is in a race to build fast-as-physically-possible probes, then one is presumably putting most of one’s compute toward winning that race, leaving one with little compute for solving the problem of what values to load the probes with.[4]
Overall, I feel pretty pessimistic about a multipolar scenario going well,[5] but I’m not confident.
assuming that new physics permitting faster-than-light travel is ruled out (or otherwise not discovered)
There’s some nuance here: maybe one has a lead and can afford some delay. Also, the prize is continuous rather than discrete—that is, one still gets some of the cosmos if one launches late (although on account of how the probes reproduce exponentially, one does lose out big time by being second)*.
*From Carl Shulman’s recent 80k interview:
you could imagine a state letting loose this robotic machinery that replicates at a very rapid rate. If it doubles 12 times in a year, you have 4,096 times as much. By the time other powers catch up to that robotic technology, if they were, say, a year or so behind, it could be that there are robots loyal to the first mover that are already on all the asteroids, on the Moon, and whatnot. And unless one tried to forcibly dislodge them, which wouldn’t really work because of the disparity of industrial equipment, then there could be an indefinite and permanent gap in industrial and military equipment.
It’s very unclear to me how large this discrepancy is likely to be. Are the loaded values totally wrong according to one’s idealized self? Or are they basically right, such that the future is almost ideal?
There’s again some nuance here, like maybe one believes that the set of world-states/matter-configurations that would score well according to one’s idealized values is very narrow. In this case, the EV calculation could indicate that it’s better to take one’s time even if this means losing almost all of the cosmos, since a single probe loaded with one’s idealized values is worth more to one than a trillion probes loaded with the values one would land on through a rushed reflective process.
There are also decision theory considerations/wildcards, like maybe the parties racing are mostly AI-led rather than human-led (in a way in which the humans are still empowered, somehow), and the AIs—being very advanced, at this point—coordinate in an FDT-ish fashion and don’t in fact race.
On top of race dynamics resulting in suboptimal values being locked in, as I’ve focused on above, I’m worried about very bad, s-risky stuff like threats and conflict, as discussed in this research agenda from CLR.
Interesting!
I think my worry is people who don't think they need advice about what the future should look like. When I imagine them making the bad decision despite having lots of time to consult superintelligent AIs, I imagine them just not being that interested in making the "right" decision? And therefore their advisors not being proactive in telling them things that are only relevant for making the "right" decision.
That is, assuming the AIs are intent aligned, they'll only help you in the ways you want to be helped:
I do hope that people won't be so thoughtless as to impose their vision of the future without seeking advice, but I'm not confident.
Briefly + roughly (not precise):
At some point we'll send out lightspeed probes to tile the universe with some flavor of computronium. The key question (for scope-sensitive altruists) is what that computronium will compute. Will an unwise agent or incoherent egregore answer that question thoughtlessly? I intuit no.
I can't easily make this intuition legible. (So I likely won't reply to messages about this.)
Caveats:
(Other than that, it seems hard to tell a story about how "AI welfare" research/interventions now could substantially improve the value of the long-term future.)
(My impression is these arguments are important to very few AI-welfare-prioritizers / most AI-welfare-prioritizers have the wrong reasons.)
My impression is these arguments are important to very few AI-welfare-prioritizers
FWIW, these motivations seem reasonably central to me personally, though not my only motivations.
I appreciate it; I'm pretty sure I have better options than finishing my Bachelor's; details are out-of-scope here but happy to chat sometime.
Common beliefs/attitudes/dispositions among [highly engaged EAs/rationalists + my friends] which seem super wrong to me:
Meta-uncertainty:
Ethics:
Cause prioritization:
Misc:
Possibly I'm wrong about which attitudes are common.
For now I'm just starting a list, not trying to be legible, much less change minds. I know I haven't explained my views.
Edit: I'm sharing controversial beliefs, without justification and with some framed provocatively. If one of these views makes you think worse of me to a nontrivial degree, please ask for elaboration; maybe there's miscommunication or it's more reasonable than it seems. Edit 2: there are so many comments; I may not respond to requests-for-elaboration but will at least notice them as a bid-for-elaboration-at-some-point.
(meta musing) The conjunction of the negations of a bunch of statements seems a bit doomed to get a lot of disagreement karma, sadly. Esp. if the statements being negated are "common beliefs" of people like the ones on this forum.
I agreed with some of these and disagreed with others, so I felt unable to agreevote. But I strongly appreciated the post overall so I strong-upvoted.
Giving a range of probabilities when you should give a probability + giving confidence intervals over probabilities + failing to realize that probabilities of probabilities just reduce to simple probabilities
This is just straightforwardly correct statistics. For example, ask a true bayesian to estimate the outcome of flipping a coin of unknown bias, and they will construct a probability distribution of coin flip probabilites, and only reduce this to a single probability when forced to make a bet. But when not taking a bet, they should be doing updates on the distribution, not the final estimate. (I'm pretty sure this is in fact the only logical way to do a bayesian update for the problem).
And why are we stating probabilities anyway? The main reason seems to be to quantify and communicate our beliefs. But if my "25% probability " comes from a different distribution to your "25% probability ", we may appear to be in agreement when in fact our worldviews differ wildly. I think giving credence intervals over probabilities is strictly better than this.
Thanks. I agree! (Except with your last sentence.) Sorry for failing to communicate clearly; we were thinking about different contexts.
Giving a range of probabilities when you should give a probability + giving confidence intervals over probabilities + failing to realize that probabilities of probabilities just reduce to simple probabilities
When I do this, it's because I'm unable or unwilling to assign a probability distribution over the probabilities, so it won't reduce to simple (precise) probabilities. Actually, in general, I think precise probabilities are epistemically unjustified (e.g. Schoenfield, 2012, section 3), but I'm willing to use more or less precise probabilities depending on the circumstances.
Unstable beliefs about stuff like AI timelines in the sense of I'd be pretty likely to say something pretty different if you asked tomorrow
I'm not sure if I'd claim to have such unstable beliefs myself, but if you're trying to be very precise with very speculative, subjective and hard-to-specifically-defend probabilities, then I'd imagine they could be very unstable, and influenced by things like your mood, e.g. optimism and pessimism bias. That is, unless you commit to your credences even if you'd had formed different ones if you had started from scratch or you make arbitrary choices in forming them that could easily have gone differently. You might weigh the same evidence or arguments differently from one day to the next.
I'd guess most people would also have had at least slightly different credences on AI timelines if they had seen the same evidence or arguments in a different order, or were in a different mood when they were forming their credences or building models, or for many other different reasons. Some number or parameter choices will come down to intuition, and intuition can be unstable.
fluctuating predictably (dutch-book-ably) is not
I don't think people are fluctuating predictably (dutch-book-ably). How exactly they'd change their minds or even the direction is not known to them ahead of time.
(But maybe you could Dutch book people by predicting their moods and so optimism and pessimism bias?)
Thanks.
Some people say things like "my doom-credence fluctuates between 10% and 25% day to day"; this is dutch-book-able and they'd make better predictions if they reported what they feel like on average rather than what they feel like today, except insofar as they have new information.
This is dutch-book-able only if there is no bid-ask spread. A rational choice in this case would be to have a very wide bid-ask spread. E.g. when Holden Karnofsky writes that his P(doom) is between 10% and 90%, I assume he would bet for doom at 9% or less, bet against doom at 91% or more, and not bet for 0.11<p<0.89. This seems a very rational choice in a high-volatility situation where information changes extremely quickly. (As an example, IIRC the bid-ask spread in financial markets increases right before earnings are released).
(I agree it is reasonable to have a bid-ask spread when betting against capable adversaries. I think the statements-I-object-to are asserting something else, and the analogy to financial markets is mostly irrelevant. I don't really want to get into this now.)
Hmm, okay. So, for example, when they’re below 15%, you bet that it will happen at odds matching 15% against them, and when they’re above 20%, you bet that it won't happen at 20% against them. And just make sure to size the bets right so that if you lose one bet, your payoff is higher in the other, which you'd win. They "give up" the 15-20% range for free to you.
Still, maybe they just mean to report the historical range or volatility of their estimates? This would be like reporting the historical volatility of a stock. They may not intend to imply, say, that they'll definitely fall below 15% at some point and above 20% at another.
Plus, picking one way to average may seem unjustifiably precise to them. The average over time is one way, but another is the average over relatively unique (clusters) of states of mind, e.g. splitting weight equally between good, ~neutral and bad moods, averages over possible sets of value assignments for various parameters. There are many different reasonable choices they can make, all pretty arbitrary.
Thank you for writing this. I share many of these, but I'm very uncertain about them.
Here it is:
Giving a range of probabilities when you should give a probability + giving confidence intervals over probabilities + failing to realize that probabilities of probabilities just reduce to simple probabilities
I think this is rational, I think of probabilities in terms of bets and order books. I think this is close to my view, and the analogy of financial markets is not irrelevant.
Unstable beliefs about stuff like AI timelines in the sense of I'd be pretty likely to say something pretty different if you asked tomorrow
Changing literally day-to-day seems extreme, but month-to-month seems very reasonable given the speed of everything that's happening, and it matches e.g. the volatility of NVIDIA stock price.
Axiologies besides ~utilitarianism
To me, "utilitarianism" seems pretty general, as long as you can arbitrarily define utility and you can arbitrarily choose between Negative/Rule/Act/Two-level/Total/Average/Preference/Classical utilitarianism. I really liked this section of a recent talk by Toby Ord (Starting from "It starts by observing that the three main traditions in Western philosophy each emphasize a different focal point:"). (I also don't know if axiology is the right word for what we want to express here, we might be talking past each other)
Veg(etari)anism for terminal reasons; veg(etari)anism as ethical rather than as a costly indulgence
I mostly agree with you, but second order effects seem hard to evaluate and both costs and benefits are so minuscule (and potentially negative) that I find it hard to do a cost-benefit-analysis.
Thinking personal flourishing (or something else agent-relative) is a terminal goal worth comparable weight to the impartial-optimization project
I agree with you, but for some it might be an instrumentally useful intentional framing. I think some use phrases like "[Personal flourishing] for its own sake, for the sake of existential risk." (see also this comment for a fun thought experiment for average utilitarians, but I don't think many believe it)
Cause prioritization that doesn't take seriously the cosmic endowment is astronomical, likely worth >10^60 happy human lives and we can nontrivially reduce x-risk
Some think the probability of extinction per century is only going up with humanity increasing capabilities, and are not convinced by arguments that we'll soon reach close-to-speed-of-light travel which will make extinction risk go down. See also e.g. Why I am probably not a longtermist (except point 1). I find this very reasonable.
Deciding in advance to boost a certain set of causes [what determines that set??], or a "portfolio approach" without justifying the portfolio-items
I agree, I think this makes a ton of sense for people in community building that need to work with many cause areas (e.g. CEA staff, Peter Singer), but I fear that it makes less sense for private individuals maximizing their impact.
Not noticing big obvious problems with impact certificates/markets
I think many people notice big obvious problems with impact certificates/markets, but think that the current system is even worse, or that they are at least worth trying and improving, to see if at their best they can in some cases be better than the alternatives we have. The current funding systems also have big obvious problems. What big obvious problems do you think they are missing?
Naively using calibration as a proxy for forecasting ability
I agree with this, just want to mention that it seems better than a common alternative that I see: using LessWrong-sounding-ness/reputation as a proxy for forecasting ability
Thinking you can (good-faith) bet on the end of the world by borrowing money ... I think many people miss that utility is about ∫consumption not ∫bankroll (note the bettor typically isn't liquidity-constrained)
I somewhat agree with you, but I think that many people model it a bit like this: "I normally consume 100k/year, you give me 10k now so I will consume 110k this year, and if I lose the bet I will consume only 80k/year X years in the future". But I agree that in practice the amounts are small and it doesn't work for many reasons.
Thanks for the engagement. Sorry for not really engaging back. Hopefully someday I'll elaborate on all this in a top-level post.
Briefly: by axiological utilitarianism, I mean classical (total, act) utilitarianism, as a theory of the good, not as a decision procedure for humans to implement.
veg(etari)anism as ethical rather than as a costly indulgence
Are you convinced the costs outweigh the benefits? It may be good for important instrumental reasons, e.g. reducing cognitive dissonance about sentience and moral weights, increasing the day-to-day salience of moral patients with limited agency or power (which could be an important share of those in the future), personal integrity or virtue, easing cooperation with animal advocates (including non-consequentialist ones), maybe health reasons.
Thanks. I agree that the benefits could outweigh the costs, certainly at least for some humans. There are sophisticated reasons to be veg(etari)an. I think those benefits aren't cruxy for many EA veg(etari)ans, or many veg(etari)ans I know.
Or me. I'm veg(etari)an for selfish reasons — eating animal corpses or feeling involved in the animal-farming-and-killing process makes me feel guilty and dirty.
I certainly haven't done the cost-benefit analysis on veg(etari)anism, on the straightforward animal-welfare consideration or the considerations you mention. For example, if I was veg(etari)an for the straightforward reason (for agent-neutral consequentialist reasons), I'd do the cost-benefit analysis, and do things like:
I think my veg(etari)an friends are mostly like me — veg(etari)an for selfish reasons. And they don't notice this.
Written quickly, maybe hard-to-parse and imprecise.
Strong upvoted and couldn't decide whether to disagreevote or not. I agree with the points you list under meta-uncertainty and your point on naively using calibration as a proxy for forecasting ability + thinking you can bet on the end of the world by borrowing money. I disagree with your thoughts on ethics (I'm sympathetic to Zvi's writing on EAs confusing the map for the territory).
What's the best thing to read on "Zvi's writing on EAs confusing the map for the territory"? Or at least something good?
I'm not sure what would be the best thing since I don't remember there being a particular post about this. However, he talks about it in his book review for Going Infinite and I also like his post on Altruism is Incomplete. Lots of people I know find his writing confusing though and it's not like he's rigorously arguing for something. When I agree with Zvi, it's usually because I have had that belief in the back of my mind for a while and him pointing it out makes it more salient, rather than because I got convinced by a particular argument he was making.
I don't want to try to explain now, sorry.
(This shortform was intended more as starting-a-personal-list than as a manifesto.)
Deciding in advance to boost a certain set of causes [what determines that set??], or a "portfolio approach" without justifying the portfolio-items
(Not totally sure what you mean here.) I think the portfolio items are justified on the basis of distinct worldviews, which differ in part based on their normative commitments (e.g. theories of welfare like hedonism or preference views, moral weights, axiology, decision theory, epistemic standards, non-consequentialist commitments) across which there is no uniquely justified universal common scale. People might be doing this pretty informally or deferring, though.
Intra-cause offsetting: if you do harm in area X, you should fix your harm in that area, even if you could do more good in another area
I think this can make sense if you have imprecise credences or normative uncertainty (for which there isn't a uniquely justified universal common scale across views). Specifically, if you're unable to decide whether action A does net good or net harm (in expectation), because it does good for cause X and harm for cause Y, and the two causes are too hard to compare, it might make sense to offset. Portfolios can be (more) robustly positive than the individual acts. EDIT: But maybe you find this too difference-making?
It takes like 20 hours of focused reading to get basic context on AI risk and threat models. Once you have that, I feel like you can read everything important in x-risk-focused AI policy in 100 hours. Same for x-risk-focused AI corporate governance, AI forecasting, and macrostrategy.
[Edit: read everything important doesn't mean you have nothing left to learn; it means something like you have context to appreciate ~all papers, and you can follow ~all conversations in the field except between sub-specialists, and you have the generators of good overviews like 12 tentative ideas for US AI policy.]
Related: Research debt.
I disagree-voted because I feel like I've done much more than 100-hours of reading on AI Policy (including finishing the AI Safety Fundamentals Governance course) and still have a strong sense there's a lot I don't know, and regularly come across new work that I find insightful. Very possibly I'm prioritising reading the wrong things (and would really value a reading list!) but thought I'd share my experience as a data point.
Here are some of the curricula that HAIST uses:
The HAIST website also has a resources tab with lists of technical and policy papers.
I sometimes post (narrow) reading lists on the forum. Are those actually helpful to anyone?
For what it's worth, I found your "AI policy ideas: Reading list" and "Ideas for AI labs: Reading list" helpful,[1] and I've recommended the former to three or four people. My guess would be that these reading lists have been very helpful to a couple or a few people rather than quite helpful to lots of people, but I'd also guess that's the right thing to be aiming for given the overall landscape.
Why don't there exist better reading lists / syllabi, especially beyond introductory stuff?
I expect there's no good reason for this, and that it's simply because it's nobody's job to make such reading lists (as far as I'm aware), and the few(?) people who could make good intermediate-to-advanced level readings lists either haven't thought to do so or are too busy doing object-level work?
Helpful in the sense of: I read or skimmed the readings in those lists that I hadn't already seen, which was maybe half of them, and I think this was probably a better use of my time than the counterfactual.
+1 to the interest in these reading lists.
Because my job is very time-consuming, I haven’t spent much time trying to understand the state of the art in AI risk. If there was a ready-made reading list I could devote 2-3 hours per week to, such that it’d take me a few months to learn the basic context of AI risk, that’d be great.
An undignified way for everyone to die: an AI lab produces clear, decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world. A less cautious lab ends the world a year later.
A possible central goal of AI governance: cause an AI lab produces decisive evidence of AI risk/scheming/uncontrollability, freaks out, and tells the world to quickly result in rules that stop all labs from ending the world.
I don't know how we can pursue that goal.
As an intuition pump: if the Trump administration,[1] or a coalition of governments led by the U.S., is faced all of a sudden—on account of intelligence explosion[2] plus alignment going well—with deciding what to do with the cosmos, will they proceed thoughtfully or kind of in a rush? I very much hope the answer is “thoughtfully,” but I would not bet[3] that way.
What about if we end up in a multipolar scenario, as forecasters think is about 50% likely? In this case, I think rushing is the default?
Pausing for a long reflection may be the obvious path to you or me or EAs in general if suddenly in charge of an aligned ASI singleton, but the way we think is very strange compared to most people in the world.[4] I expect that without a good deal of nudging/convincing, the folks calling the shots will not opt for such reflection.[5]
(Note that I don’t consider this a knockdown argument for putting resources towards AI welfare in particular: I only voted slightly in the direction of “agree” for this debate week. I do, however, think that many more EA resources should be going towards ASI governance / setting up a long reflection, as I have written before.)
One thread here that feels relevant: I don’t think it’s at all obvious that superintelligent advisors will be philosophically competent.[6] Wei Dai has written a series of posts on this topic (which I collected here); this is an open area of inquiry that serious thinkers in our sphere are funding. In my model, this thread links up with AI welfare since welfare is in part an empirical problem, which superintelligent advisors will be great at helping with, but also in part a problem of values and philosophy.[7]
the likely U.S. presidential administration for the next four years
in this world, TAI has been nationalized
I apologize to Nuño, who will receive an alert, for not using “bet” in the strictly correct way.
All recent U.S. presidents have been religious, for instance.
My mainline prediction is that decision makers will put some thought towards things like AI welfare—in fact, by normal standards they’ll put quite a lot of thought towards these things—but they will fall short of the extreme thoughtfulness that a scope-sensitive assessment of the stakes calls for. (This prediction is partly informed by someone I know who’s close to national security, and who has been testing the waters there to gauge the level of openness towards something like a long reflection.)
One might argue that this is a contradictory statement, since the most common definition of superintelligence is an AI system (or set of systems) that’s better than the best human experts in all domains. So, really, what I’m saying is that I believe it’s very possible we end up in a situation in which we think we have superintelligence—and the AI we have sure is superhuman at many/most/almost-all things—but, importantly, philosophy is its Achilles heel.
(To be clear, I don’t believe there’s anything special about biological human brains that makes us uniquely suited to philosophy; I don’t believe that philosophically competent AIs are precluded from the space of all possible AIs. Nonetheless, I do think there’s a substantial chance that the “aligned” “superintelligence” we build in practice lacks philosophical competence, to catastrophic effect. (For more, see Wei Dai’s posts.))
Relatedly, if illusionism is true, then welfare is a fully subjective problem.