@Wei Dai, I understand that your plan A is an AI pause (+ human intelligence enhancement). And I agree with you that this is the best course of action. Nonetheless, I’m interested in what you see as plan B: If we don’t get an AI pause, is there any version of ‘hand off these problems to AIs’/ ‘let ‘er rip’ that you feel optimistic about? or which you at least think will result in lower p(catastrophe) than other versions? If you have $1B to spend on AI labour during crunch time, what do you get the AIs to work on?
The answer would depend a lot on what the...
I wish you titled the post something like "The option value argument for preventing extinction doesn't work". Your current title ("The option value argument doesn't work when it's most needed") has the unfortunate side effects of:
The argument tree (arguments, counterarguments, counter-counterarguments, and so on) is exponentially sized and we don't know how deep or wide we need to expand it, before some problem can be solved. We do know that different humans looking at the same partial tree (i.e., philosophers who have read the same literature on some problem) can have very different judgments as to what the correct conclusion is. There's also a huge amount of intuition/judgment involved in choosing which part of the tree to focus on or expand further. With AIs helping to expand th...
Do you want to talk about why you're relatively optimistic? I've tried to explain my own concerns/pessimism at https://www.lesswrong.com/posts/EByDsY9S3EDhhfFzC/some-thoughts-on-metaphilosophy and https://forum.effectivealtruism.org/posts/axSfJXriBWEixsHGR/ai-doing-philosophy-ai-generating-hands.
Thanks! I hope this means you'll spend some more time on this type of work, and/or tell other philosophers about this argument. It seems apparent that we need more philosophers to work on philosophical problems related to AI x-safety (many of which do not seem to be legible to most non-philosophers). Not necessarily by attacking them directly (this is very hard and probably not the best use of time, as we previously discussed) but instead by making them more legible to AI researchers, decisionmakers, and the general public.
In my view, there isn't much desire for work like this from people in the field and they probably wouldn't use it to inform deployment unless a lot of effort is also added from the author to meet the right people, convince theme to spend the time to take it seriously etc.
Any thoughts on Legible vs. Illegible AI Safety Problems, which is in part a response to this?
I wrote a post that I think was partly inspired by this discussion. The implication of it here is that I don't necessarily want philosophers to directly try to solve the many hard philosophical problems relevant to AI alignment/safety (especially given how few of them are in this space or concerned about x-safety), but initially just to try to make them "more legible" to others, including AI researchers, key decision makers, and the public. Hopefully you agree that this is a more sensible position.
I agree that many of the problems on my list are very hard and probably not the highest marginal value work to be doing from an individual perspective. Keep in mind that the list was written 6 years ago, when it was less clear when the AI takeoff would start in earnest, or how many philosophers will become motivated to work on AI safety when AGI became visibly closer. I still had some hope that when the time came, a significant fraction of all philosophers would become self-motivated or would be "called to arms" by a civilization-wide AI safety effort, and...
Right, I know about Will MacAskill, Joe Carlsmith, and your work in this area, but none of you are working on alignment per se full time or even close to full time AFAIK, and the total effort is clearly far from adequate to the task at hand.
I think some have given up philosophy to work on other things such as AI alignment.
Any other names you can cite?
...In my view, there isn't much desire for work like this from people in the field and they probably wouldn't use it to inform deployment unless a lot of effort is also added from the author to meet the rig
I'm a philosopher who's switched to working on AI safety full-time. I also know there are at least a few philosophers at Anthropic working on alignment.
With regards to your Problems in AI Alignment that philosophers could potentially contribute to:
Do you have any insights into why there are so few philosophers working in AI alignment, or closely with alignment researchers? (Amanda Askell is the only one I know.) Do you think this is actually a reasonable state of affairs (i.e., it's right or fine that almost no professional philosophers work directly as or with alignment researchers), or is this wrong/suboptimal, caused by some kind of cultural or structural problem? It's been 6 years since I wrote Problems in AI Alignment that philosophers could potentially contribute to and I've gotten a few comme...
Interesting re belief in hell being a key factor, I wasn't thinking about that.
It seems like the whole AI x-risk community has latched onto "align AI with human values/intent" as the solution, with few people thinking even a few steps ahead to "what if we succeeded"? I have a post related to this if you're interested.
possibly the future economy will be so much more complicated that it will still make sense to have some distributed information processing in the market rather than have all optimisation centrally planned
I think there will be distribute...
Why do you think this work has less value than solving philosophical problems in AI safety?
From the perspective of comparative advantage and counterfactual impact, this work does not seem to require philosophical training. It seems to be straightforward empirical research, that many people could do, besides the very few professionally trained AI-risk-concerned philosophers that humanity has.
To put it another way, I'm not sure that Toby was wrong to work on this, but if he was, it's because if he hadn't, then someone else with more comparative advantage for working on this problem (due to lacking training or talent for philosophy) would have done so shortly afterwards.
While I appreciate this work being done, it seems a very bad sign for our world/timeline that the very few people with both philosophy training and an interest in AI x-safety are using their time/talent to do forecasting (or other) work instead of solving philosophical problems in AI x-safety, with Daniel Kokotajlo being another prominent example.
This implies one of two things: Either they are miscalculating the best way to spend their time, which indicates bad reasoning or intuitions even among humanity's top philosophers (i.e., those who have at least re...
Why do you think this work has less value than solving philosophical problems in AI safety? If LLM scaling is sputtering out, isn't that important to know? In fact, isn't it a strong contender for the most important fact about AI that could be known right now?
I suppose you could ask why this work hasn't been done by somebody else already and that's a really good question. For instance, why didn't anyone doing equity research or AI journalism notice this already?
Among people who are highly concerned about near-term AGI, I don't really expect suc...
The ethical schools of thought I'm most aligned with—longtermism, sentientism, effective altruism, and utilitarianism—are far more prominent in the West (though still very niche).
I want to point out that the ethical schools of thought that you're (probably) most anti-aligned with (e.g., that certain behaviors and even thoughts are deserving of eternal divine punishment) are also far more prominent in the West, proportionately even more so than the ones you're aligned with.
Also the Western model of governance may not last into the post-AGI era regardless...
Whereas it seems like maybe you think it's convex, such that smaller pauses or slowdowns do very little?
I think my point in the opening comment does not logically depend on whether the risk vs time (in pause/slowdown) curve is convex or concave[1], but it may be a major difference in how we're thinking about the situation, so thanks for surfacing this. In particular I see 3 large sources of convexity:
A couple more thoughts on this.
Throwing in my 2c on this:
I think it's likely that without a long (e.g. multi-decade) AI pause, one or more of these "non-takeover AI risks" can't be solved or reduced to an acceptable level. To be more specific:
I think it's likely that without a long (e.g. multi-decade) AI pause, one or more of these "non-takeover AI risks" can't be solved or reduced to an acceptable level.
I don't understand why you're framing the goal as "solving or reducing to an acceptable level", rather than thinking about how much expected impact we can have. I'm in favour of slowing the intelligence explosion (and in particular of "Pause at human-level".) But here's how I'd think about the conversion of slowdown/pause into additional value:
Let's say the software-only intelligenc...
A couple more thoughts on this.
Perhaps the most important question is whether you support a restriction on space colonization (completely or to a few nearby planets) during the Long Reflection. Unrestricted colonization seems good from a pure pro-natalist perspective, but bad from an optionalist perspective, as it makes much more likely that if anti-natalism (or adjacent positions like there should be strict care or controls over what lives can be brought into existence) is right, some of the colonies will fail to reach the correct conclusion and go on to colonize the universe in an un...
I think both natalism and anti-natalism risk committing moral atrocities, if their opposite position turns out to be correct. Natalism if either people are often mistaken about their lives being worth living (cf Deluded Gladness Argument), or bringing people into existence requires much more due diligence about understanding/predicting their specific well-informed preferences (perhaps more understanding than current science and philosophy allow). Anti-natalism if human extinction implies losing an astronomically large amount of potential value (cf Astronom...
I'm generally a fan of John Cochrane. I would agree that government regulation of AI isn't likely to work out well, which is why I favor an international pause on AI development instead (less need for government competence on detailed technical matters).
His stance on unemployment seems less understandable. I guess he either hasn't considered the possibility that AGI could drive wages below human subsistence levels, or think that's fine (humans just work for the same low wages as AIs and governments make up the difference with a "broad safety net that cushi...
Vitalik Buterin: Right. Well, one thing is one domain being offence-dominant by itself isn’t a failure condition, right? Because defence-dominant domains can compensate for offence-dominant domains. And that has totally happened in the past, many times. If you even just compare now to 1,000 years ago: cannons are very offence-dominant and castles stopped them working. But if you compare physical warfare now to before, is it more offence-dominant on the whole? It’s not clear, right?
I wish there was discussion about a longer pause (e.g. multi-decade), to allow time for human genetic enhancement to take effect. Does @CarlShulman support that, and why or why not?
Also I'm having trouble making sense of the following. What kind of AI disaster is Carl worried about, that's only a disaster for him personally, but not for society?
But also, I’m worried about disaster at a personal level. If AI was going to happen 20 years later, that would better for me. But that’s not the way to think about it for society at large.
Thanks for letting me know! I have been wondering for a while why AI philosophical competence is so neglected, even compared to other subareas of what I call "ensuring a good outcome for the AI transition" (which are all terribly neglected in my view), and I appreciate your data point. Would be interested to hear your conclusions after you've thought about it.
(I understand you are very busy this week, so please feel free to respond later.)
Re desires, the main upshot of non-dualist views of consciousness I think is responding to arguments that invoke special properties of conscious states to say they matter but not other concerns of people.
I would say that consciousness seems very plausibly special in that it seems very different from other types of things/entities/stuff we can think or talk or have concerns about. I don't know if it's special in a "magical" way or some other way (or maybe not special at all...
Have you considered working on metaphilosophy / AI philosophical competence instead? Conditional on correct philosophy about AI welfare being important, most of future philosophical work will probably be done by AIs (to help humans / at our request, or for their own purposes). If AIs do that work badly and arrive at wrong conclusions, then all the object-level philosophical work we do now might only have short-term effects and count for little in the long run. (Conversely if we have wrong views now but AIs correct them later, that seems less disastrous.)
The 2017 Report on Consciousness and Moral Patienthood by Muehlhauser assumes illusionism about human consciousness to be true.
Reading that, it appears Muehlhauser's illusionism (perhaps unlike Carl's although I don't have details on Carl's views) is a form that does not imply that consciousness does not exist nor strongly motivates desire satisfactionism:
...There is “something it is like” to be us, and I doubt there it is “something it is like” to be a chess-playing computer, and I think the difference is morally important. I just think our intuitions m
...The you that chooses is more fundamental than the you that experiences, because if you remove experience you get a blindmind you that will presumably want it back. Even if it can’t be gotten back, presumably **you **will still pursue your values whatever they were. On the other hand, if you remove your entire algorithm but leave the qualia, you get an empty observer that might not be completely lacking in value, but wouldn’t be you, and if you then replace the algorithm you get a sentient someone else.
Thus I submit that moral patients are straightforwardl
Here he is following a cluster of views in philosophy that hold that consciousness is not necessary for moral status. Rather, an entity, even if it is not conscious, can merit moral consideration if it has a certain kind of **agency: **preferences, desires, goals, interests, and the like.
The articles you cite, and Carl himself (via private discussion) all cite the possibility that there is no such thing as consciousness (illusionism, "physicalist/zombie world") as the main motivation for this moral stance (named "Desire Satisfactionism" by one of the pa...
Just to back this up, since Wei has mentioned it, it does seem like a lot of the Open-Phil-cluster is to varying extents bought into illusionism. I think this is a highly controversial view, especially for those outside of Analytical Philosophy of Mind (and even within the field many people argue against it, I basically agree with Galen Strawson's negative take on it as an entire approach to consciousness).
Therefore, it seems clear to us that we need to immediately prioritize and fund serious, non-magical research that helps us better understand what features predict whether a given system is conscious
Can you talk a bit about how such research might work? The main problem I see is that we do not have "ground truth labels" about which systems are or are not conscious, aside from perhaps humans and inanimate objects. So this seemingly has to be mostly philosophical as opposed to scientific research, which tends to progress very slowly (perhaps for good reas...
Another podcast linked below with some details about Will and Toby's early interactions with the Rationality community. Also Holden Karnofsky has an account on LW, and interacted with the Rationality community via e.g. this extensively discussed 2011 post.
https://80000hours.org/podcast/episodes/will-macaskill-what-we-owe-the-future/
Will MacAskill: But then the biggest thing was just looking at what are the options I have available to me in terms of what do I focus my time on? Where one is building up this idea of Giving What We Can, kind of a moral movemen...
https://80000hours.org/podcast/episodes/will-macaskill-moral-philosophy/
Robert Wiblin: We’re going to dive into your philosophical views, how you’d like to see effective altruism change, life as an academic, and what you’re researching now. First, how did effective altruism get started in the first place?
Will MacAskill: Effective altruism as a community is really the confluence of 3 different movements. One was Give Well, co-founded by Elie Hassenfeld and Holden Karnofsky. Second was Less Wrong, primarily based in the bay area. The third is the co-founding...
As I’ve said elsewhere, I have more complicated feelings about genetic enhancement. I think it is potentially beneficial, but also tends to be correlated with bad politics, and it could be the negative social effects of allowing it outweigh the benefits.
I appreciate you keeping on open mind on genetic enhancement (i.e., not grouping it with racism and fascism, or immediately calling for it to be banned). Nevertheless, it fills me with a sense of hopelessness to consider that one of the most thoughtful groups of people on Earth (i.e., EAs) might still re...
I think paying AIs to reveal their misalignment and potentially to work for us and prevent AI takeover seems like a potentially very promising intervention.
I'm pretty skeptical of this. (Found a longer explanation of the proposal here.)
An AI facing such a deal would be very concerned that we're merely trying to trick it into revealing its own misalignment (which we'd then try to patch out). It seems to me that it would probably be a lot easier for us to trick an AI into believing that we're honestly presenting it such a deal (including by directly manip...
A couple of further considerations, or "stops on the crazy train", that you may be interested in:
(These were written in an x-risk framing, but implications for s-risk are fairly straightforward.)
As far as actionable points, I've been advocating working on metaphilosophy or AI philosophical competence, as a way of speeding up philosophical progress in general (so that it doesn't fall behind other kinds of intellectual progress, such as scientific and te...
The main alternative to truth-seeking is influence-seeking. EA has had some success at influence-seeking, but as AI becomes the locus of increasingly intense power struggles, retaining that influence will become more difficult, and it will tend to accrue to those who are most skilled at power struggles.
Thanks for the clarification. Why doesn't this imply that EA should get better at power struggles (e.g. by putting more resources into learning/practicing/analyzing corporate politics, PR, lobbying, protests, and the like)? I feel like maybe you're adopti...
Why doesn't this imply that EA should get better at power struggles (e.g. by putting more resources into learning/practicing/analyzing corporate politics, PR, lobbying, protests, and the like)?
Of course this is all a spectrum, but I don't believe this implication in part because I expect that impact is often heavy-tailed. You do something really well first and foremost by finding the people who naturally inclined towards being some of the best in the world at it. If a community that was really good at power struggles tried to get much better at truth-seeki...
I've also updated over the last few years that having a truth-seeking community is more important than I previously thought - basically because the power dynamics around AI will become very complicated and messy, in a way that requires more skill to navigate successfully than the EA community has. Therefore our comparative advantage will need to be truth-seeking.
I'm actually not sure about this logic. Can you expand on why EA having insufficient skill to "navigate power dynamics around AI" implies "our comparative advantage will need to be truth-seeking...
The main alternative to truth-seeking is influence-seeking. EA has had some success at influence-seeking, but as AI becomes the locus of increasingly intense power struggles, retaining that influence will become more difficult, and it will tend to accrue to those who are most skilled at power struggles.
I agree that extreme truth-seeking can be counterproductive. But in most worlds I don't think that EA's impact comes from arguing for highly controversial ideas; and I'm not advocating for extreme truth-seeking like, say, hosting public debates on the most c...
You probably didn't have someone like me in mind when you wrote this, but it seems a good opportunities to write down some of my thoughts about EA.
On 1, I think despite paying lip service to moral uncertainty, EA encourages too much certainty in the normative correctness of altruism (and more specific ideas like utilitarianism), perhaps attracting people like SBF with too much philosophical certainty in general (such as about how much risk aversion is normative), or even causing such general overconfidence (by implying that philosophical questions in gener...
The problem of motivated reasoning is in some ways much deeper than the trolley problem.
The motivation behind motivated reasoning is often to make ourselves look good (in order to gain status/power/prestige). Much of the problem seems to come from not consciously acknowledging this motivation, and therefore not being able to apply system 2 to check for errors in the subconscious optimization.
My approach has been to acknowledge that wanting to make myself look good may be a part of my real or normative values (something like what I would conclude my valu...
The CCP's current appetite for AGI seems remarkably small, and I expect them to be more worried that an AGI race would leave them in the dust (and/or put their regime at risk, and/or put their lives at risk), than excited about the opportunity such a race provides.
Yeah, I also tried to point this out to Leopold on LW and via Twitter DM, but no response so far. It confuses me that he seems to completely ignore the possibility of international coordination, as that's the obvious alternative to what he proposes, that others must have also brought up to him...
I think his answer is here:
...Some hope for some sort of international treaty on safety. This seems fanciful to me. The world where both the CCP and USG are AGI-pilled enough to take safety risk seriously is also the world in which both realize that international economic and military predominance is at stake, that being months behind on AGI could mean being permanently left behind. If the race is tight, any arms control equilibrium, at least in the early phase around superintelligence, seems extremely unstable. In short, ”breakout” is too easy: the incentive
But we’re so far away from having that alternative that pining after it is a distraction from the real world.
For one thing, we could try to make OpenAI/SamA toxic to invest in or do business with, and hope that other AI labs either already have better governance / safety cultures, or are greatly incentivized to improve on those fronts. If we (EA as well as the public in general) give him a pass (treat him as a typical/acceptable businessman), what lesson does that convey to others?
I should add that there may be a risk of over-correcting (focusing too much on OpenAI and Sam Altman), and we shouldn't forget about other major AI labs, how to improve their transparency, governance, safety cultures, etc. This project (Zach Stein-Perlman's AI Lab Watch) seems a good start, if anyone is interested in a project to support or contribute ideas to.
I'm also concerned about many projects having negative impact, but think there are some with robustly positive impact:
Agreed with the general thrust of this post. I'm trying to do my part, despite a feeling of "PR/social/political skills is so far from what I think of as my comparative advantage. What kind of a world am I living in, that I'm compelled to do these things?"
I should add that there may be a risk of over-correcting (focusing too much on OpenAI and Sam Altman), and we shouldn't forget about other major AI labs, how to improve their transparency, governance, safety cultures, etc. This project (Zach Stein-Perlman's AI Lab Watch) seems a good start, if anyone is interested in a project to support or contribute ideas to.
Those low on the spectrum tend to shape the incentives around them proactively to create a culture that rewards what they don’t want to lose about their good qualities.
What percent of people do you think fall into this category? Any examples? Why are we so bad at distinguishing such people ahead of time and often handing power to the easily corrupted instead?
#5 seems off to me. I don’t know whether OpenAI uses nondisparagement agreements;
Details about OpenAI's nondisparagement agreements have come out.
Unlike FTX, OpenAI has now had a second wave of resignations in protest of insufficient safety focus.
Personally, I think fascism should be more upsetting than woke debate!
I'm not very familiar with Reactionary philosophy myself, but was suspicious of your use of "fascism" here. Asked Copilot (based on GPT-4) and it answered:
...As an AI, I don’t form personal opinions. However, I can share that Reactionary philosophy and Fascism are distinct ideologies, even though they might share some common elements such as a critique of modernity and a preference for traditional social structures.
Fascism is typically characterized by dictatorial power, forcible suppr
Problems with this:
Solving these problems see... (read more)