All of Wei Dai's Comments + Replies

  • Cultivating people's ability and motivation to reflect on their values.
  • Structuring collective deliberations so that better arguments and ideas win out over time.

Problems with this:

  1. People disagree strongly about which arguments and ideas are better than others.
  2. The vast majority of people seem to lack both the hardware (raw cognitive capacity) and software (a reasonable philosophical tradition) to reflect on their values in a positive way.
  3. AI seems likely to make things worse or not sufficiently better, for these reasons.

Solving these problems see... (read more)

See also this post, which occurred to me after writing my previous reply to you.

@Wei Dai, I understand that your plan A is an AI pause (+ human intelligence enhancement). And I agree with you that this is the best course of action. Nonetheless, I’m interested in what you see as plan B: If we don’t get an AI pause, is there any version of ‘hand off these problems to AIs’/ ‘let ‘er rip’ that you feel optimistic about? or which you at least think will result in lower p(catastrophe) than other versions? If you have $1B to spend on AI labour during crunch time, what do you get the AIs to work on?

The answer would depend a lot on what the... (read more)

I wish you titled the post something like "The option value argument for preventing extinction doesn't work". Your current title ("The option value argument doesn't work when it's most needed") has the unfortunate side effects of:

  1. People being more likely to misinterpret or misremember your post as claiming that trying to increase option value doesn't work in general.
  2. Reducing extinction risk becomes the most salient example of an idea for increasing option value.
  3. People using "the option value argument" to mean the the option value argument for preventin
... (read more)

The argument tree (arguments, counterarguments, counter-counterarguments, and so on) is exponentially sized and we don't know how deep or wide we need to expand it, before some problem can be solved. We do know that different humans looking at the same partial tree (i.e., philosophers who have read the same literature on some problem) can have very different judgments as to what the correct conclusion is. There's also a huge amount of intuition/judgment involved in choosing which part of the tree to focus on or expand further. With AIs helping to expand th... (read more)

3
Elliott Thornley (EJT)
I said a little in another thread. If we get aligned AI, I think it'll likely be a corrigible assistant that doesn't have its own philosophical views that it wants to act on. And then we can use these assistants to help us solve philosophical problems. I'm imagining in particular that these AIs could be very good at mapping logical space, tracing all the implications of various views, etc. So you could ask a question and receive a response like: 'Here are the different views on this question. Here's why they're mutually exclusive and jointly exhaustive. Here are all the most serious objections to each view. Here are all the responses to those objections. Here are all the objections to those responses,' and so on. That would be a huge boost to philosophical progress. Progress has been slow so far because human philosophers take entire lifetimes just to fill in one small part of this enormous map, and because humans make errors so later philosophers can't even trust that small filled-in part, and because verification in philosophy isn't much quicker than generation.

Thanks! I hope this means you'll spend some more time on this type of work, and/or tell other philosophers about this argument. It seems apparent that we need more philosophers to work on philosophical problems related to AI x-safety (many of which do not seem to be legible to most non-philosophers). Not necessarily by attacking them directly (this is very hard and probably not the best use of time, as we previously discussed) but instead by making them more legible to AI researchers, decisionmakers, and the general public.

In my view, there isn't much desire for work like this from people in the field and they probably wouldn't use it to inform deployment unless a lot of effort is also added from the author to meet the right people, convince theme to spend the time to take it seriously etc.

Any thoughts on Legible vs. Illegible AI Safety Problems, which is in part a response to this?

I wrote a post that I think was partly inspired by this discussion. The implication of it here is that I don't necessarily want philosophers to directly try to solve the many hard philosophical problems relevant to AI alignment/safety (especially given how few of them are in this space or concerned about x-safety), but initially just to try to make them "more legible" to others, including AI researchers, key decision makers, and the public. Hopefully you agree that this is a more sensible position.

4
Elliott Thornley (EJT)
Yes, I agree this is valuable, though I think it's valuable mainly because it increases the probability that people use future AIs to solve these problems, rather than because it will make people slow down AI development or try very hard to solve them pre-TAI.

I agree that many of the problems on my list are very hard and probably not the highest marginal value work to be doing from an individual perspective. Keep in mind that the list was written 6 years ago, when it was less clear when the AI takeoff would start in earnest, or how many philosophers will become motivated to work on AI safety when AGI became visibly closer. I still had some hope that when the time came, a significant fraction of all philosophers would become self-motivated or would be "called to arms" by a civilization-wide AI safety effort, and... (read more)

4
Elliott Thornley (EJT)
I don't think philosophical difficulty is that much of an increase to the difficulty of alignment, mainly because I think that AI developers should (and likely will) aim to make AIs corrigible assistants rather than agents with their own philosophical views that they try to impose on the world. And I think it's fairly likely that we can use these assistants (if we succeed in getting them and aren't disempowered by a misaligned AI instead) to help a lot with these hard philosophical questions.

Right, I know about Will MacAskill, Joe Carlsmith, and your work in this area, but none of you are working on alignment per se full time or even close to full time AFAIK, and the total effort is clearly far from adequate to the task at hand.

I think some have given up philosophy to work on other things such as AI alignment.

Any other names you can cite?

In my view, there isn't much desire for work like this from people in the field and they probably wouldn't use it to inform deployment unless a lot of effort is also added from the author to meet the rig

... (read more)

I'm a philosopher who's switched to working on AI safety full-time. I also know there are at least a few philosophers at Anthropic working on alignment.

With regards to your Problems in AI Alignment that philosophers could potentially contribute to:

  • I agree that many of these questions are important and that more people should work on them.
  • But a fair amount of them are discussed in conventional academic philosophy, e.g.:
    • How to resolve standard debates in decision theory?
    • Infinite/multiversal/astronomical ethics
    • Fair distribution of benefits
    • What is the nature o
... (read more)

Do you have any insights into why there are so few philosophers working in AI alignment, or closely with alignment researchers? (Amanda Askell is the only one I know.) Do you think this is actually a reasonable state of affairs (i.e., it's right or fine that almost no professional philosophers work directly as or with alignment researchers), or is this wrong/suboptimal, caused by some kind of cultural or structural problem? It's been 6 years since I wrote Problems in AI Alignment that philosophers could potentially contribute to and I've gotten a few comme... (read more)

5
Toby_Ord
Re 99% of academic philosophers, they are doing their own thing and have not heard of these possibilities and wouldn't be likely to move away from their existing areas if they had. Getting someone to change their life's work is not easy and usually requires hours of engagement to have a chance. It is especially hard to change what people work on in a field when you are outside that field. A different question is about the much smaller number of philosophers who engage with EA and/or AI safety (there are maybe 50 of these). Some of these are working on some of those topics you mention. e.g. Will MacAskill and Joe Carlsmith have worked on several of these. I think some have given up philosophy to work on other things such as AI alignment. I've done occasional bits of work related to a few of these (e.g. here on dealing with infinities arising in decision theory and ethics without discounting) and also to other key philosophical questions that aren't on your list. For such philosophers, I think it is a mixture of not having seen your list and not being convinced these are the best things that they each could be working on.

Interesting re belief in hell being a key factor, I wasn't thinking about that.

It seems like the whole AI x-risk community has latched onto "align AI with human values/intent" as the solution, with few people thinking even a few steps ahead to "what if we succeeded"? I have a post related to this if you're interested.

possibly the future economy will be so much more complicated that it will still make sense to have some distributed information processing in the market rather than have all optimisation centrally planned

I think there will be distribute... (read more)

Why do you think this work has less value than solving philosophical problems in AI safety?

From the perspective of comparative advantage and counterfactual impact, this work does not seem to require philosophical training. It seems to be straightforward empirical research, that many people could do, besides the very few professionally trained AI-risk-concerned philosophers that humanity has.

To put it another way, I'm not sure that Toby was wrong to work on this, but if he was, it's because if he hadn't, then someone else with more comparative advantage for working on this problem (due to lacking training or talent for philosophy) would have done so shortly afterwards.

4
Yarrow Bouchard 🔸
How shortly? We're discussing this in October 2025. What's the newest piece of data that Toby's analysis is dependent on? Maybe the Grok 4 chart from July 2025? Or possibly qualitative impressions from the GPT-5 launch in August 2025? Who else is doing high-quality analysis of this kind and publishing it, even using older data?  I guess I don't automatically buy the idea that even in a few months we'll see someone else independently go through the same reasoning steps as this post and independently come to the same conclusion. But there are plenty of people who could, in theory, do it and who are, in theory, motivated to do this kind of analysis and who also will probably not see this post (e.g. equity research analysts, journalists covering AI, AI researchers and engineers independent of LLM companies).  I certainly don't buy the idea that if Toby hadn't done this analysis, then someone else in effective altruism would have done it. I don't see anybody else in effective altruism doing similar analysis. (I chalk that up largely to confirmation bias.)

While I appreciate this work being done, it seems a very bad sign for our world/timeline that the very few people with both philosophy training and an interest in AI x-safety are using their time/talent to do forecasting (or other) work instead of solving philosophical problems in AI x-safety, with Daniel Kokotajlo being another prominent example.

This implies one of two things: Either they are miscalculating the best way to spend their time, which indicates bad reasoning or intuitions even among humanity's top philosophers (i.e., those who have at least re... (read more)

4
david_reinstein
If this post is indeed cutting-edge and prominent, I would be more surprised by the fact that there are not more 'quant' people reporting on this than by the fact that more philosophers are not working on AI x-risk related issues.

Why do you think this work has less value than solving philosophical problems in AI safety? If LLM scaling is sputtering out, isn't that important to know? In fact, isn't it a strong contender for the most important fact about AI that could be known right now? 

I suppose you could ask why this work hasn't been done by somebody else already and that's a really good question. For instance, why didn't anyone doing equity research or AI journalism notice this already? 

Among people who are highly concerned about near-term AGI, I don't really expect suc... (read more)

The ethical schools of thought I'm most aligned with—longtermism, sentientism, effective altruism, and utilitarianism—are far more prominent in the West (though still very niche).

I want to point out that the ethical schools of thought that you're (probably) most anti-aligned with (e.g., that certain behaviors and even thoughts are deserving of eternal divine punishment) are also far more prominent in the West, proportionately even more so than the ones you're aligned with.

Also the Western model of governance may not last into the post-AGI era regardless... (read more)

6
David_Althaus
For what it's worth, we recently ran a cross-cultural survey (n > 1,000 after extensive filtering) on endorsement of eternal extreme punishment, with questions like "If I could create a system that makes deserving people feel unbearable pain forever, I would" and "If hell didn't exist, or if it stopped existing, we should create it [...]". ~16-19% of Chinese respondents consistently endorsed such statements, compared to ~10–14% of US respondents—despite China being majority atheist/agnostic.[1]  Of course, online surveys are notoriously unreliable, especially on such abstract questions. But if these results hold up, concerns about eternal punishment would actually count against a China-dominated future, not in favor of one. 1. ^ On individual questions, agreement rates were usually much higher, especially in China and other non-Western countries. The above numbers reflect a conservative conjunctive measure filtering for consistency across multiple questions.
5
OscarD🔸
I agree that both possibilities are very risky. Interesting re belief in hell being a key factor, I wasn't thinking about that. Even if a future ASI would be able to very efficiently manage todays economy in a fully centralised way, possibly the future economy will be so much more complicated that it will still make sense to have some distributed information processing in the market rather than have all optimisation centrally planned? Seems unclear to me one way or the other, and I assume we won't be able to know with high confidence in advance what economic model will be most efficient post-ASI. But maybe that just reflects my economic ignorance and others are justifiedly confident.

Whereas it seems like maybe you think it's convex, such that smaller pauses or slowdowns do very little?

I think my point in the opening comment does not logically depend on whether the risk vs time (in pause/slowdown) curve is convex or concave[1], but it may be a major difference in how we're thinking about the situation, so thanks for surfacing this. In particular I see 3 large sources of convexity:

  1. The disjunctive nature of risk / conjunctive nature of success. If there are N problems that all have to solved correctly to get a near-optimal future, withou
... (read more)
4
Will Aldred
[sorry I’m late to this thread] @William_MacAskill, I’m curious which (if any) of the following is your position? 1. “I agree with Wei that an approach of ‘point AI towards these problems’ and ‘listen to the AI-results that are being produced’ has a real (>10%? >50%?) chance of ending in moral catastrophe (because ‘aligned’ AIs will end up (unintentionally) corrupting human values or otherwise leading us into incorrect conclusions). And if we were living in a sane world, then we’d pause AI development for decades, alongside probably engaging in human intelligence enhancement, in order to solve the deep metaethical and metaphilosophical problems at play here. However, our world isn’t sane, and an AI pause isn’t in the cards: the best we can do is to differentially advance AIs’ philosophical competence,[1] and hope that that’s enough to avoid said catastrophe.” 2. “I don’t buy the argument that aligned AIs can unintentionally corrupt human values. Furthermore, I’m decently confident that my preferred metaethical theory (e.g., idealising subjectivism) is correct. If intent alignment goes well, then I expect a fairly simple procedure like ‘give everyone a slice of the light cone, within which they can do anything they want (modulo some obvious caveats), and facilitate moral trade’ will result in a near-best future.” 3. “Maybe aligned AIs can unintentionally corrupt human values, but I don’t particularly think this matters since it won’t be average humans making the important decisions. My proposal is that we fully hand off questions re. what to do with the light cone to AIs (rather than have these AIs boost/amplify humans). And I don’t buy that there is a metaphilosophical problem here: If we can train AIs to be at least as good as the best human philosophers at the currently in-distribution ethical+philosophical problems, then I see no reason to think that these AIs will misgeneralise out of distribution any more than humans would. (There’s nothing special abou

A couple more thoughts on this.

  1. Maybe I should write something about cultivating self-skepticism for an EA audience, in the meantime here's my old LW post How To Be More Confident... That You're Wrong. (On reflection I'm pretty doubtful these suggestions actually work well enough. I think my own self-skepticism mostly came from working in cryptography research in my early career, where relatively short feedback cycles, e.g. someone finding a clear flaw in an idea you thought secure or your own attempts to pre-empt this, repeatedly bludgeon overconfidence ou
... (read more)
8
William_MacAskill
I don't think EA should be THE hub. In an ideal world, loads of people and different groups would be working on these issues.  But at the moment, really almost no one is. So the question is whether it's better if, given that, EA does work on it, and at least some work gets done. I think yes. (Analogy: was it good or bad that in the earlier days, there was some work on AI alignment, even though that work was almost exclusively done by EA/rationalist types?)

Throwing in my 2c on this:

  1. I think EA often comes with a certain kind of ontology (consequentialism, utilitarianism, generally thinking in terms of individuals) which is kind of reflected in the top-level problems given here (from the first list: persuasion, human power concentration, AI character and welfare) - not just the focus but the framing of what the problem even is.
  2. I think there are nearby problems which are best understood from a slightly different ontology - how AI will affect cultural development, the shifting of power from individuals to emerge
... (read more)

I think it's likely that without a long (e.g. multi-decade) AI pause, one or more of these "non-takeover AI risks" can't be solved or reduced to an acceptable level. To be more specific:

  1. Solving AI welfare may depend on having a good understanding of consciousness, which is a notoriously hard philosophical problem.
  2. Concentration of power may be structurally favored by the nature of AGI or post-AGI economics, and defy any good solutions.
  3. Defending against AI-powered persuasion/manipulation may require solving metaphilosophy, which judging from other compar
... (read more)

I think it's likely that without a long (e.g. multi-decade) AI pause, one or more of these "non-takeover AI risks" can't be solved or reduced to an acceptable level.

 

I don't understand why you're framing the goal as "solving or reducing to an acceptable level", rather than thinking about how much expected impact we can have.  I'm in favour of slowing the intelligence explosion (and in particular of "Pause at human-level".) But here's how I'd think about the conversion of slowdown/pause into additional value:

Let's say the software-only intelligenc... (read more)

A couple more thoughts on this.

  1. Maybe I should write something about cultivating self-skepticism for an EA audience, in the meantime here's my old LW post How To Be More Confident... That You're Wrong. (On reflection I'm pretty doubtful these suggestions actually work well enough. I think my own self-skepticism mostly came from working in cryptography research in my early career, where relatively short feedback cycles, e.g. someone finding a clear flaw in an idea you thought secure or your own attempts to pre-empt this, repeatedly bludgeon overconfidence ou
... (read more)

Perhaps the most important question is whether you support a restriction on space colonization (completely or to a few nearby planets) during the Long Reflection. Unrestricted colonization seems good from a pure pro-natalist perspective, but bad from an optionalist perspective, as it makes much more likely that if anti-natalism (or adjacent positions like there should be strict care or controls over what lives can be brought into existence) is right, some of the colonies will fail to reach the correct conclusion and go on to colonize the universe in an un... (read more)

I think both natalism and anti-natalism risk committing moral atrocities, if their opposite position turns out to be correct. Natalism if either people are often mistaken about their lives being worth living (cf Deluded Gladness Argument), or bringing people into existence requires much more due diligence about understanding/predicting their specific well-informed preferences (perhaps more understanding than current science and philosophy allow). Anti-natalism if human extinction implies losing an astronomically large amount of potential value (cf Astronom... (read more)

4
Richard Y Chappell🔸
Isn't the point of the Long Reflection to avoid "locking in" irreversible mistakes? Extinction, for example, is irreversible. But large population isn't. So I don't actually see any sense in which present "min-natalism" maintains more future "optionality" (or better minimizes moral risks) than pro-natalism. Both leave entirely open what future generations choose to do. They just differ in our present population target. And presently aiming for a "minimal population" strikes me as much the worse and riskier of the two options, for both intrinsic moral reasons and instrumental ones like misjudging / undershooting the minimally sustainable level.

I'm generally a fan of John Cochrane. I would agree that government regulation of AI isn't likely to work out well, which is why I favor an international pause on AI development instead (less need for government competence on detailed technical matters).

His stance on unemployment seems less understandable. I guess he either hasn't considered the possibility that AGI could drive wages below human subsistence levels, or think that's fine (humans just work for the same low wages as AIs and governments make up the difference with a "broad safety net that cushi... (read more)

Vitalik Buterin: Right. Well, one thing is one domain being offence-dominant by itself isn’t a failure condition, right? Because defence-dominant domains can compensate for offence-dominant domains. And that has totally happened in the past, many times. If you even just compare now to 1,000 years ago: cannons are very offence-dominant and castles stopped them working. But if you compare physical warfare now to before, is it more offence-dominant on the whole? It’s not clear, right?

  1. How does defense-dominant domains compensate for offense-dominant domain
... (read more)

I wish there was discussion about a longer pause (e.g. multi-decade), to allow time for human genetic enhancement to take effect. Does @CarlShulman support that, and why or why not?

Also I'm having trouble making sense of the following. What kind of AI disaster is Carl worried about, that's only a disaster for him personally, but not for society?

But also, I’m worried about disaster at a personal level. If AI was going to happen 20 years later, that would better for me. But that’s not the way to think about it for society at large.

Thanks for letting me know! I have been wondering for a while why AI philosophical competence is so neglected, even compared to other subareas of what I call "ensuring a good outcome for the AI transition" (which are all terribly neglected in my view), and I appreciate your data point. Would be interested to hear your conclusions after you've thought about it.

I liked your "Choose your (preference) utilitarianism carefully" series and think you should finish part 3 (unless I just couldn't find it) and repost it on this forum.

2
Arepo
Thanks! I wrote a first draft a few years ago, but I wanted an approach that leaned on intuition as little as possible if at all, and ended up thinking my original idea was untenable. I do have some plans on how to revisit it and would love to do so once I have the bandwidth.

(I understand you are very busy this week, so please feel free to respond later.)

Re desires, the main upshot of non-dualist views of consciousness I think is responding to arguments that invoke special properties of conscious states to say they matter but not other concerns of people.

I would say that consciousness seems very plausibly special in that it seems very different from other types of things/entities/stuff we can think or talk or have concerns about. I don't know if it's special in a "magical" way or some other way (or maybe not special at all... (read more)

Have you considered working on metaphilosophy / AI philosophical competence instead? Conditional on correct philosophy about AI welfare being important, most of future philosophical work will probably be done by AIs (to help humans / at our request, or for their own purposes). If AIs do that work badly and arrive at wrong conclusions, then all the object-level philosophical work we do now might only have short-term effects and count for little in the long run. (Conversely if we have wrong views now but AIs correct them later, that seems less disastrous.)

3
rileyharris
I hadn't, that's an interesting idea, thanks!

The 2017 Report on Consciousness and Moral Patienthood by Muehlhauser assumes illusionism about human consciousness to be true.

Reading that, it appears Muehlhauser's illusionism (perhaps unlike Carl's although I don't have details on Carl's views) is a form that does not imply that consciousness does not exist nor strongly motivates desire satisfactionism:

There is “something it is like” to be us, and I doubt there it is “something it is like” to be a chess-playing computer, and I think the difference is morally important. I just think our intuitions m

... (read more)
6
CarlShulman
Physicalists and illusionists mostly don't agree with the identification of 'consciousness' with magical stuff or properties bolted onto the psychological or cognitive science picture of minds. All the real feelings and psychology that drive our thinking, speech and action exist. I care about people's welfare, including experiences they like, but also other concerns they have (the welfare of their children, being remembered after they die), and that doesn't hinge on magical consciousness that we, the physical organisms having this conversation, would have no access to. The illusion is of the magical part. Re desires, the main upshot of non-dualist views of consciousness I think is responding to arguments that invoke special properties of conscious states to say they matter but not other concerns of people. It's still possible to be a physicalist and think that only selfish preferences focused on your own sense impressions or introspection matter, it just looks more arbitrary. I think this is important because it's plausible that many AI minds will have concerns mainly focused on the external world rather than their own internal states, and running roughshod over those values because they aren't narrowly mentally-self-focused seems bad to me.

The you that chooses is more fundamental than the you that experiences, because if you remove experience you get a blindmind you that will presumably want it back. Even if it can’t be gotten back, presumably **you **will still pursue your values whatever they were. On the other hand, if you remove your entire algorithm but leave the qualia, you get an empty observer that might not be completely lacking in value, but wouldn’t be you, and if you then replace the algorithm you get a sentient someone else.

Thus I submit that moral patients are straightforwardl

... (read more)
3
Chase Carter
I see where you're coming from. Regarding paperclippers: in addition to what Pumo said in their reply concerning mutual alignment (and what will be said in Part 2), I'd say that stupid goals are stupid goals independent of sentience. I wouldn't altruistically devote resources to helping a non-sentient AI make a billion paperclips for the exact same reason that I wouldn't altruistically devote resources to helping some human autist make a billion paperclips. Maybe I'm misunderstanding your objection; possibly your objection is something more like "if we unmoor moral value from qualia, there's nothing left to ground it in and the result is absurdity". For now I'll just say that we are definitely not asserting "all agents have equal moral status", or "all goals are equal/interchangeable" (indeed, Part 2 asserts the opposite).     Regarding 'why should I care about the blindmind for-its-own-sake?', here's another way to get there:         My understanding is that there are two primary competing views which assign non-sentient agents zero moral status: 1. Choice/preference (even of sentient beings) of a moral patient isn't inherently relevant, qualia valence is everything (e.g. we should tile the universe in hedonium), e.g. hedonic utilitarianism. We don't address this view much in the essay, except to point out that a) most people don't believe this, and b) to whatever extent that this view results in totalizing hegemonic behavior, it sets the stage for mass conflict between believers of it and opponents of it (presumably including non-sentient agents). If we accept a priori something like 'positive valenced qualia is the only fundamental good', this view might at least be internally consistent (at which point I can only argue against the arbitrariness of accepting the a priori premise about qualia or argue against any system of values that appears to lead to it's own defeat timelessly, though I realize the latter is a whole object-level debate to be had in and of i
3
Pumo
Your interpretation isn't exactly wrong, I'm proposing an onthological shift on the understanding of what's more central to the self, the thing to care about (i.e. is the moral patient fundamentally a qualia that has or can have an agent, or an agent that has or can have qualia?). The intuition is that if qualia, on its own, is generic and completely interchangeable among moral patients, it might not be what makes them such, even if it's an important value. A blindmind upload has ultimately far more in common with the sentient person they are based on than said person has with a phenomenal experience devoid of all the content that makes up their agency. Thus it would be the agent the thing that primarily values the qualia (and everything else), rather than the reverse. This decenters qualia even if it is exceptionally valuable, being valuable not a priori (and thus, the agent being valued instrumentally in order to ensure its existence) but because it was chosen (and the thing that would have intrinsic value would be that which can make such choices). A blindmind that doesn't want qualia would be valuable then in this capacity to value things about the world in general, of which qualia is just a particular type (even if very valuable for sentient agents). The appropiate type to compare rather than a Paperclip Maximizer (who, in Part 2 I argue, represents a type of agent whose values are inherently an aggression against the possibility of universal cooperation) would be aliens with strange and hard to comprehend values but no more intrinsically tied to the destruction of everything else than human values. If the moral patiency in them is only their qualia, then the best thing we could do for them is to just give them positive feelings, routing around whatever they valued in particular as means to that (and thus ultimately not really being about changing the outer world). Respecting their agency would mean at least trying to understand what they are trying to do,

Here he is following a cluster of views in philosophy that hold that consciousness is not necessary for moral status. Rather, an entity, even if it is not conscious, can merit moral consideration if it has a certain kind of **agency: **preferences, desires, goals, interests, and the like.

The articles you cite, and Carl himself (via private discussion) all cite the possibility that there is no such thing as consciousness (illusionism, "physicalist/zombie world") as the main motivation for this moral stance (named "Desire Satisfactionism" by one of the pa... (read more)

4
Michael St Jules 🔸
Illusionism doesn't deny consciousness, but instead denies that consciousness is phenomenal. Whatever consciousness turns out to be could still play the same role in ethics. This wouldn't specifically require a move towards desire satisfactionism. However, one way to motivate desire satisfactionism is that desires — if understood broadly enough to mean any appearance that something matters, is good, bad, better or worse, etc., including pleasure, unpleasantness, more narrowly understood desires, moral views, goals, etc. — capture all the ways anything can "care" about or be motivated by anything. I discuss this a bit more here. They could also ground a form of morally relevant consciousness, at least minimally, if it's all gradualist under illusionism anyway (see also my comment here). So, then they could capture all morally relevant consciousness, i.e. all the ways anything can consciously care about anything. I don't really see why we should care about more narrowly defined desires to the exclusion of hedonic states, say (or vice versa). It seems to me that both matter. But I don't know if Carl or others intend to exclude hedonic states.

Just to back this up, since Wei has mentioned it, it does seem like a lot of the Open-Phil-cluster is to varying extents bought into illusionism. I think this is a highly controversial view, especially for those outside of Analytical Philosophy of Mind (and even within the field many people argue against it, I basically agree with Galen Strawson's negative take on it as an entire approach to consciousness).

... (read more)

Therefore, it seems clear to us that we need to immediately prioritize and fund serious, non-magical research that helps us better understand what features predict whether a given system is conscious

Can you talk a bit about how such research might work? The main problem I see is that we do not have "ground truth labels" about which systems are or are not conscious, aside from perhaps humans and inanimate objects. So this seemingly has to be mostly philosophical as opposed to scientific research, which tends to progress very slowly (perhaps for good reas... (read more)

Wei Dai
16
0
0
1
1

Another podcast linked below with some details about Will and Toby's early interactions with the Rationality community. Also Holden Karnofsky has an account on LW, and interacted with the Rationality community via e.g. this extensively discussed 2011 post.

https://80000hours.org/podcast/episodes/will-macaskill-what-we-owe-the-future/

Will MacAskill: But then the biggest thing was just looking at what are the options I have available to me in terms of what do I focus my time on? Where one is building up this idea of Giving What We Can, kind of a moral movemen... (read more)

https://80000hours.org/podcast/episodes/will-macaskill-moral-philosophy/

Robert Wiblin: We’re going to dive into your philosophical views, how you’d like to see effective altruism change, life as an academic, and what you’re researching now. First, how did effective altruism get started in the first place?

Will MacAskill: Effective altruism as a community is really the confluence of 3 different movements. One was Give Well, co-founded by Elie Hassenfeld and Holden Karnofsky. Second was Less Wrong, primarily based in the bay area. The third is the co-founding... (read more)

Wei Dai
57
11
0
3

As I’ve said elsewhere, I have more complicated feelings about genetic enhancement. I think it is potentially beneficial, but also tends to be correlated with bad politics, and it could be the negative social effects of allowing it outweigh the benefits.

I appreciate you keeping on open mind on genetic enhancement (i.e., not grouping it with racism and fascism, or immediately calling for it to be banned). Nevertheless, it fills me with a sense of hopelessness to consider that one of the most thoughtful groups of people on Earth (i.e., EAs) might still re... (read more)

I think paying AIs to reveal their misalignment and potentially to work for us and prevent AI takeover seems like a potentially very promising intervention.

I'm pretty skeptical of this. (Found a longer explanation of the proposal here.)

An AI facing such a deal would be very concerned that we're merely trying to trick it into revealing its own misalignment (which we'd then try to patch out). It seems to me that it would probably be a lot easier for us to trick an AI into believing that we're honestly presenting it such a deal (including by directly manip... (read more)

A couple of further considerations, or "stops on the crazy train", that you may be interested in:

(These were written in an x-risk framing, but implications for s-risk are fairly straightforward.)

As far as actionable points, I've been advocating working on metaphilosophy or AI philosophical competence, as a way of speeding up philosophical progress in general (so that it doesn't fall behind other kinds of intellectual progress, such as scientific and te... (read more)

3
Rafael Ruiz
Thanks a lot for the links, I will give them a read and get back to you! Regarding the "Lower than 1%? A lot more uncertainty due to important unsolved questions in philosophy of mind." part, it was a mistake because I was thinking of current AI systems. I will delete the % credence since I have so much uncertainty that any theory or argument that I find compelling (for the substrate-dependence or substate-independence of sentience) would change my credence substantially.

The main alternative to truth-seeking is influence-seeking. EA has had some success at influence-seeking, but as AI becomes the locus of increasingly intense power struggles, retaining that influence will become more difficult, and it will tend to accrue to those who are most skilled at power struggles.

Thanks for the clarification. Why doesn't this imply that EA should get better at power struggles (e.g. by putting more resources into learning/practicing/analyzing corporate politics, PR, lobbying, protests, and the like)? I feel like maybe you're adopti... (read more)

Why doesn't this imply that EA should get better at power struggles (e.g. by putting more resources into learning/practicing/analyzing corporate politics, PR, lobbying, protests, and the like)?

Of course this is all a spectrum, but I don't believe this implication in part because I expect that impact is often heavy-tailed. You do something really well first and foremost by finding the people who naturally inclined towards being some of the best in the world at it. If a community that was really good at power struggles tried to get much better at truth-seeki... (read more)

I've also updated over the last few years that having a truth-seeking community is more important than I previously thought - basically because the power dynamics around AI will become very complicated and messy, in a way that requires more skill to navigate successfully than the EA community has. Therefore our comparative advantage will need to be truth-seeking.

I'm actually not sure about this logic. Can you expand on why EA having insufficient skill to "navigate power dynamics around AI" implies "our comparative advantage will need to be truth-seeking... (read more)

9
Radical Empath Ismam
I mean, why not? Less-wrong "rationality" isn't foundational to EA, it's not even the accepted school of criticial thinking. For example, I personally come from the "scientific skepticism" tradition (think Skeptics Guide to the Universe, Steven Novella, James Randi, etc...), and in my opinion, since EA is simply scientific skepticism applied to charity, scientific skepticism is the much more natural basis for criticial thinking in the EA movement than LW.

The main alternative to truth-seeking is influence-seeking. EA has had some success at influence-seeking, but as AI becomes the locus of increasingly intense power struggles, retaining that influence will become more difficult, and it will tend to accrue to those who are most skilled at power struggles.

I agree that extreme truth-seeking can be counterproductive. But in most worlds I don't think that EA's impact comes from arguing for highly controversial ideas; and I'm not advocating for extreme truth-seeking like, say, hosting public debates on the most c... (read more)

You probably didn't have someone like me in mind when you wrote this, but it seems a good opportunities to write down some of my thoughts about EA.

On 1, I think despite paying lip service to moral uncertainty, EA encourages too much certainty in the normative correctness of altruism (and more specific ideas like utilitarianism), perhaps attracting people like SBF with too much philosophical certainty in general (such as about how much risk aversion is normative), or even causing such general overconfidence (by implying that philosophical questions in gener... (read more)

The problem of motivated reasoning is in some ways much deeper than the trolley problem.

The motivation behind motivated reasoning is often to make ourselves look good (in order to gain status/power/prestige). Much of the problem seems to come from not consciously acknowledging this motivation, and therefore not being able to apply system 2 to check for errors in the subconscious optimization.

My approach has been to acknowledge that wanting to make myself look good may be a part of my real or normative values (something like what I would conclude my valu... (read more)

The CCP's current appetite for AGI seems remarkably small, and I expect them to be more worried that an AGI race would leave them in the dust (and/or put their regime at risk, and/or put their lives at risk), than excited about the opportunity such a race provides.

Yeah, I also tried to point this out to Leopold on LW and via Twitter DM, but no response so far. It confuses me that he seems to completely ignore the possibility of international coordination, as that's the obvious alternative to what he proposes, that others must have also brought up to him... (read more)

I think his answer is here:

Some hope for some sort of international treaty on safety. This seems fanciful to me. The world where both the CCP and USG are AGI-pilled enough to take safety risk seriously is also the world in which both realize that international economic and military predominance is at stake, that being months behind on AGI could mean being permanently left behind. If the race is tight, any arms control equilibrium, at least in the early phase around superintelligence, seems extremely unstable. In short, ”breakout” is too easy: the incentive

... (read more)
2[anonymous]
It seems Alignment folk have a libertarian bent.  "Liberty: Prioritizes individual freedom and autonomy, resisting excessive governmental control and supporting the right to personal wealth. Lower scores may be more accepting of government intervention, while higher scores champion personal freedom and autonomy..." "alignment researchers are found to score significantly higher in liberty (U=16035, p≈0)" https://forum.effectivealtruism.org/posts/eToqPAyB4GxDBrrrf/key-takeaways-from-our-ea-and-alignment-research-surveys?commentId=HYpqRTzrz2G6CH5Xx

But we’re so far away from having that alternative that pining after it is a distraction from the real world.

For one thing, we could try to make OpenAI/SamA toxic to invest in or do business with, and hope that other AI labs either already have better governance / safety cultures, or are greatly incentivized to improve on those fronts. If we (EA as well as the public in general) give him a pass (treat him as a typical/acceptable businessman), what lesson does that convey to others?

6
Greg_Colbourn ⏸️
Yeah, I also don't think we are that far away. OpenAI seems like it's just a few more scandals-similar-to-the-past-week's away from implosion. Or at least, Sam's position as CEO seems to be on shaky ground again, and this time he won't have unanimous support from the rank-and-file employees.

I should add that there may be a risk of over-correcting (focusing too much on OpenAI and Sam Altman), and we shouldn't forget about other major AI labs, how to improve their transparency, governance, safety cultures, etc. This project (Zach Stein-Perlman's AI Lab Watch) seems a good start, if anyone is interested in a project to support or contribute ideas to.

Answer by Wei Dai5
1
0

I'm also concerned about many projects having negative impact, but think there are some with robustly positive impact:

  1. Making governments and the public better informed about AI risk, including e.g. what x-safety cultures at AI labs are like, and the true state of alignment progress. Geoffrey Irving is doing this at UK AISI and recruiting, for example.
  2. Try to think of important new arguments/considerations, for example a new form of AI risk that nobody has considered, or new arguments for some alignment approach being likely or unlikely to succeed. (But t
... (read more)

Agreed with the general thrust of this post. I'm trying to do my part, despite a feeling of "PR/social/political skills is so far from what I think of as my comparative advantage. What kind of a world am I living in, that I'm compelled to do these things?"

I should add that there may be a risk of over-correcting (focusing too much on OpenAI and Sam Altman), and we shouldn't forget about other major AI labs, how to improve their transparency, governance, safety cultures, etc. This project (Zach Stein-Perlman's AI Lab Watch) seems a good start, if anyone is interested in a project to support or contribute ideas to.

Those low on the spectrum tend to shape the incentives around them proactively to create a culture that rewards what they don’t want to lose about their good qualities.

What percent of people do you think fall into this category? Any examples? Why are we so bad at distinguishing such people ahead of time and often handing power to the easily corrupted instead?

7
Lukas_Gloor
Off the cuff answers that may change as I reflect more: * Maybe around 25% of people in leadership positions in the EA ecosystem qualify? Somewhat lower for positions at orgs that are unusually "ambitious;" somewhat higher for positions that are more like "iterate on a proven system" or "have a slow-paced research org that doesn't involve itself too much in politics." * For the ambitious leaders, I unfortunately have no examples where I feel particularly confident, but can think of a few examples where I'm like "from a distance, it looks like they might be good leaders." I would count Holden in that category, even though I'd say the last couple of years seem suboptimal in terms of track record (and also want to flag that this is just a "from a distance" impression, so don't put much weight on it). * Why we're bad at identifying: This probably isn't the only reason, but the task is just hard. If you look at people who have ambitious visions and are willing to try hard to make them happen, they tend to be above-average on dark triad traits. You probably want someone who is very much not high on psychopathic traits, but still low enough on neuroticism that they won't be anxious all the time. Similarly, you want someone who isn't too high on narcissism, but they still need to have that ambitious vision and belief in being exceptional. You want someone who is humble and has inner warmth so they will uplift others along the way, so high on honesty-humility factor, but that correlates with agreeableness and neuroticism – which is a potential problem because you probably can't be too agreeable in the startup world or when running an ambitious org generally, and you can't be particularly neurotic. * (Edit) Another reason is, I think people often aren't "put into leadership positions" by others/some committee; instead, they put themselves there. Like, usually there isn't some committee with a great startup idea looking for a leader; instead, the leader comes with the v
1
Closed Limelike Curves
If I had to guess, the EA community is probably a bit worse at this than most communities because A) bad social skills and B) high trust. This seems like a good tradeoff in general. I don't think we should be putting more emphasis on smooth-talking CEOs—which is what got us into the OpenAI mess in the first place.  But at some point, defending Sam Altman is just charlie_brown_football.jpg

#5 seems off to me. I don’t know whether OpenAI uses nondisparagement agreements;

Details about OpenAI's nondisparagement agreements have come out.

Unlike FTX, OpenAI has now had a second wave of resignations in protest of insufficient safety focus.

Personally, I think fascism should be more upsetting than woke debate!

I'm not very familiar with Reactionary philosophy myself, but was suspicious of your use of "fascism" here. Asked Copilot (based on GPT-4) and it answered:

As an AI, I don’t form personal opinions. However, I can share that Reactionary philosophy and Fascism are distinct ideologies, even though they might share some common elements such as a critique of modernity and a preference for traditional social structures.

Fascism is typically characterized by dictatorial power, forcible suppr

... (read more)
1
Concerned EA Forum User
Neo-reactionary ideology seems like a close match for fascism. The Wikipedia article on it discusses whether it is or isn’t fascism: https://en.wikipedia.org/wiki/Dark_Enlightenment Two major themes of neo-reactionary ideology seem to be authoritarianism and white supremacy. There is definitely some overlap between people who identify with neo-reactionary ideas and people who identify with explicitly neo-Nazi/neo-fascist ideas.
Load more