MB

Matthew_Barnett

3796 karmaJoined

Comments
350

I think fixed discount rates (i.e. a discount rate where every year, no matter how far away, reduces the weighting by the same fraction) of any amount seems pretty obviously crazy to me as a model of the future. We use discount rates as a proxy for things like "predictability of the future" and "constraining our plans towards worlds we can influence", which often makes sense, but I think even very simple thought-experiments produce obviously insane conclusions if you use practically any non-zero fixed discount rate in situations where it comes apart from the proxies (as it virtually guaranteed to happen in the long-run future).

I agree there’s a decent case to be made for abandoning fixed exponential discount rates in favor of a more nuanced model. However, it’s often unclear what model is best suited to handle scenarios involving a sequence of future events — T_1, T_2,T_3,…,T_N — where our knowledge about T_i is always significantly greater than our knowledge about T_{i + 1}.

From what I understand, many EAs seem to reject time discounting partly because they accept an empirical premise that goes something like this: “The future becomes increasingly difficult to predict as we look further ahead, but at some point, there will be a "value lock-in" — a moment when key values or structures become fixed — and after this lock-in, the long-term future could become highly predictable, even over time horizons spanning billions of years.” If this premise is correct, it might justify using something like a fixed discount rate for time periods leading up to the value lock-in, but then something like a zero rate of time discounting after the anticipated lock-in.

Personally, I find the concept of a value lock-in to be highly uncertain and speculative. Because of this, I’m skeptical of the conclusion that we should treat the level of epistemic uncertainty about the world, say, 1,000 years from now as being essentially the same as the uncertainty about the world 1 billion years from now. While both timeframes might feel similarly distant from our perspective — both being “a long time from now” — I ultimately think there’s still a meaningful difference: predicting the state of the world 1 billion years from now is likely much harder than predicting the state of the world 1,000 years from now.

One reasonable compromise model between these two perspectives is to tie the discount rate to the predicted amount of change that will happen at a given point of time. This could lead to a continuously increasing discounting rate for years that lead up to and include AGI, but then eventually a falling discounting rate for later years as technological progress becomes relatively saturated.

For example, a single percentage point of reduction of existential risks would be worth (from a utilitarian expected utility point-of-view) a delay of over 10 million years.

I'm curious how many EAs believe this claim literally, and think a 10 million year pause (assuming it's feasible in the first place) would be justified if it reduced existential risk by a single percentage point. Given the disagree votes to my other comments, it seems a fair number might in fact agree to the literal claim here.

Given my disagreement that we should take these numbers literally, I think it might be worth writing a post about why we should have a pragmatic non-zero discount rate, even from a purely longtermist perspective.

I think this is a reasonable point of disagreement. Though, as you allude to, it is separate from the point I was making. 

I do think it is generally very important to distinguish between:

  1. Advocacy for a policy because you think it would have a tiny impact on x-risk, which thereby outweighs all the other side effects of the policy, including potentially massive near-term effects, because reducing x-risk simply outweighs every other ethical priority by many orders of magnitude.
  2. Advocacy for a policy because you think it would have a moderate or large effect on x-risk, and is therefore worth doing because reducing x-risk is an important ethical priority (even if it isn't, say, one million times more important than every other ethical priority combined).

I'm happy to debate (2) on empirical grounds, and debate (1) on ethical grounds. I think the ethical philosophy behind (1) is quite dubious and resembles the type of logic that is vulnerable to Pascal-mugging. The ethical philosophy behind (2) seems sound, but the empirical basis is often uncertain.

I think it would require an unreasonably radical interpretation of longtermism to believe, for example, that delaying something as valuable as a cure for cancer by 10 years (or another comparably significant breakthrough) would be justified, let alone overwhelmingly outweighed, because of an extremely slight and speculative anticipated positive impact on existential risk. Similarly, I think the same is true about AI, if indeed pausing the technology would only have a very slight impact on existential risk in expectation.

I’ve already provided a pragmatic argument for incorporating at least a slight amount of time discounting into one’s moral framework, but I want to reemphasize and elaborate on this point for clarity. Even if you are firmly committed to the idea that we should have no pure rate of time preference—meaning you believe future lives and welfare matter just as much as present ones—you should still account for the fact that the future is inherently uncertain. Our ability to predict the future diminishes significantly the farther we look ahead. This uncertainty should generally lead us to favor not delaying the realization of clearly good outcomes unless there is a strong and concrete justification for why the delay would yield substantial benefits.

Longtermism, as I understand it, is simply the idea that the distant future matters a great deal and should be factored into our decision-making. Longtermism does not—and should not—imply that we should essentially ignore enormous, tangible and clear short-term harms just because we anticipate extremely slight and highly speculative long-term gains that might result from a particular course of action.

I recognize that someone who adheres to an extremely strong and rigid version of longtermism might disagree with the position I’m articulating here. Such a person might argue that even a very small and speculative reduction in existential risk justifies delaying massive and clear near-term benefits. However, I generally believe that people should not adopt this kind of extreme strong longtermism. It leads to moral conclusions that are unreasonably detached from the realities of suffering and flourishing in the present and near future, and I think this approach undermines the pragmatic and balanced principles that arguably drew many of us to longtermism in the first place.

I agree with many of the things other people have already mentioned. However, I want to add one additional argument against PauseAI, which I believe is quite important and worth emphasizing clearly:

In general, hastening technological progress tends to be a good thing. For example, if a cure for cancer were to arrive in 5 years instead of 15 years, that would be very good. The earlier arrival of the cure would save many lives and prevent a lot of suffering for people who would otherwise endure unnecessary pain or death during those additional 10 years. The difference in timing matters because every year of delay means avoidable harm continues to occur.

I believe this same principle applies to AI, as I expect its main effects will likely be overwhelmingly positive. AI seems likely to accelerate economic growth, accelerate technological progress, and significantly improve health and well-being for billions of people. These outcomes are all very desirable, and I would strongly prefer for them to arrive sooner rather than later. Delaying these benefits unnecessarily means forgoing better lives, better health, and better opportunities for many people in the interim.

Of course, there are exceptions to this principle, as it’s not always the case that hastening technology is beneficial. Sometimes it is indeed wiser to delay the deployment of a new technology if the delay would substantially increase its safety or reduce risks. I’m not dogmatic about hastening technology and I recognize there are legitimate trade-offs here. However, in the case of AI, I am simply not convinced that delaying its development and deployment is justified on current margins.

To make this concrete, let’s say that delaying AI development by 5 years would reduce existential risk by only 0.001 percentage points. I would not support such a trade-off. From the perspective of any moral framework that incorporates even a slight discounting of future consumption and well-being, such a delay would be highly undesirable. There are pragmatic reasons to include time discounting in a moral framework: the future is inherently uncertain, and the farther out we try to forecast, the less predictable and reliable our expectations about the future become. If we can bring about something very good sooner, without significant costs, we should almost always do so rather than being indifferent to when it happens.

However, if the situation were different—if delaying AI by 5 years reduced existential risk by something like 10 percentage points—then I think the case for PauseAI would be much stronger. In such a scenario, I would seriously consider supporting PauseAI and might even advocate for it loudly. That said, I find this kind of large reduction in existential risk from a delay in AI development to be implausible, partly for the reasons others in this thread have already outlined.

According to the standard story where an unaligned AI has some optimization target and then kills all humans in the interest of pursuing that target (e.g. a paperclip maximizer), it seems unlikely that this AI would experience much happiness (granting that it's capable of happiness) because its own happiness is not the optimization target.

I agree that this is the standard story regarding AI risk, but I haven’t seen convincing arguments that support this specific model.

In other words, I see no compelling evidence to believe that future AIs will have exclusively abstract, disconnected goals—like maximizing paperclip production—and that such AIs would fail to generate significant amounts of happiness, either as a byproduct of their goals or as an integral part of achieving them.

(Of course, it’s crucial to avoid wishful thinking. A favorable outcome is by no means guaranteed, and I’m not arguing otherwise. Instead, my point is that the core assumption underpinning this standard narrative seems weakly argued and poorly substantiated.)

The scenario I find most plausible is one in which AIs have a mixture of goals, much like humans. Some of these goals will likely be abstract, while others will be directly tied to the AI’s internal experiences and mental states.

Just as humans care about their own happiness but also care about external reality—such as the impact they have on the world or what happens after they’re dead—I expect that many AIs will place value on both their own mental states and various aspects of external reality.

This ultimately depends on how AIs are constructed and trained, of course. However, as you mentioned, there are some straightforward reasons to anticipate parallels between how goals emerge in animals and how they might arise in AIs. For example, robots and some other types of AIs will likely be trained through reinforcement learning. While RL on computers isn’t identical to the processes by which animals learn, it is similar enough in critical ways to suggest that these parallels could have significant implications.

What's your credence that humans create a utopia in the alternative? Depending on the strictness of one's definition, I think a future utopia is quite unlikely either way, whether we solve alignment or not.

It seems you expect future unaligned AIs will either be unconscious or will pursue goals that result in few positive conscious experiences being created. I am not convinced of this myself. At the very least, I think such a claim demands justification.

Given the apparent ubiquity of consciousness in the animal kingdom, and the anticipated sophistication of AI cognition, it is difficult for me to imagine a future essentially devoid of conscious life, even if that life is made of silicon and it does not share human preferences.

This argument only makes sense if you have a very low P(doom) (like <0.1%) or if you place minimal value on future generations. Otherwise, it's not worth recklessly endangering the future of humanity to bring utopia a few years (or maybe decades) sooner. The math on this is really simple—bringing AI sooner only benefits the current generation, but extinction harms all future generations. You don't need to be a strong longtermist, you just need to accord significant value to people who aren't born yet.

Here's a counter-argument that relies on the following assumptions:

First, suppose you believe unaligned AIs would still be conscious entities, capable of having meaningful, valuable experiences. This could be because you think unaligned AIs will be very cognitively sophisticated, even if they don't share human preferences.

Second, assume you're a utilitarian who doesn't assign special importance to whether the future is populated by biological humans or digital minds. If both scenarios result in a future full of happy, conscious beings, you’d view them as roughly equivalent. In fact, you might even prefer digital minds if they could exist in vastly larger numbers or had features that enhanced their well-being relative to biological life.

With those assumptions in place, consider the following dilemma:

  1. If AI is developed soon, there’s some probability p that billions of humans will die due to misaligned AI—an obviously bad outcome. However, if these unaligned AIs replace us, they would presumably still go on to create a thriving and valuable civilization from a utilitarian perspective, even though humanity would not be part of that future.

  2. If AI development is delayed by several decades to ensure safety, billions of humans will die in the meantime from old age who could otherwise have been saved by accelerated medical advancements enabled by earlier AI. This, too, is clearly bad. However, humanity would eventually develop AI safely and go on to build a similarly valuable civilization, just after a significant delay.

Given these two options, a utilitarian doesn't have strong reasons to prefer the second approach. While the first scenario carries substantial risks, it does not necessarily endanger the entire long-term future. Instead, the primary harm seems to fall on the current generation: either billions of people die prematurely due to unaligned AI, or they die from preventable causes like aging because of delayed technological progress. In both cases, the far future—whether filled with biological or digital minds—remains intact and flourishing under these assumptions.

In other words, there simply isn't a compelling utilitarian argument for choosing to delay AI in this dilemma.

Do you have any thoughts on the assumptions underlying this dilemma, or its conclusion?

TL;DR...

Restrictions to advanced AI would likely delay technological progress and potentially require a state of surveillance.

To be clear, I wasn't arguing against generic restrictions on advanced AIs. In fact, I advocated for restrictions, in the form of legal protections on AIs against abuse and suffering. In my comment, I was solely arguing against a lengthy moratorium, rather than arguing against more general legal rules and regulations.

Given my argument, I'd go further than saying that the relevant restrictions I was arguing against would "likely delay technological progress". They almost certainly would have that effect, since I was talking about a blanket moratorium, rather than more targeted or specific rules governing the development of AI (which I support).

I think what is missing for this argument to go through is arguing that the costs in 2 are higher than the cost of mistreated Artificial Sentience.

A major reason why I didn't give this argument was because I already conceded that we should have legal protections against mistreated Artificial Sentience. The relevant comparison is not between a scenario with no restrictions on mistreatment vs. restrictions that prevent against AI mistreatment, but rather between the moratorium discussed in the post vs. more narrowly scoped regulations that specifically protect AIs from mistreatment.

Let me put this another way. Let's say we were to impose a moratorium on advanced AI, for the reasons given in this post. The idea here is presumably that, during the moratorium, society will deliberate on what we should do with advanced AI. After this deliberation concludes, society will end the moratorium, and then implement whatever we decided on.

What types of things might we decide to do, while deliberating? A good guess is that, upon the conclusion of the moratorium, we could decide to implement strong legal protections against AI mistreatment. In that case, the result of the moratorium appears identical to the legal outcome that I had already advocated, except with one major difference: with the moratorium, we'd have spent a long time with no advanced AI.

It could well be the case that spending, say, 50 years with no advanced AI is always better than nothing—from a utilitarian point of view—because AIs might suffer on balance more than they are happy, even with strong legal protections. If that is the case, the correct conclusion to draw is that we should never build AI, not that we should spend 50 years deliberating. Since I didn't think this was the argument being presented, I didn't spend much time arguing against the premise supporting this conclusion.

Instead, I wanted to focus on the costs of delay and deliberation, which I think are quite massive and often overlooked. Given these costs, if the end result of the moratorium is that we merely end up with the same sorts of policies that we could have achieved without the delay, the moratorium seems flatly unjustified. If the result of the moratorium is that we end up with even worse policies, as a result of the cultural effects I talked about, then the moratorium is even less justified.

(I'm repeating something I said in another comment I wrote a few hours ago, but adapted to this post.)

On a basic level, I agree that we should take artificial sentience extremely seriously, and think carefully about the right type of laws to put in place to ensure that artificial life is able to happily flourish, rather than suffer. This includes enacting appropriate legal protections to ensure that sentient AIs are treated in ways that promote well-being rather than suffering. Relying solely on voluntary codes of conduct to govern the treatment of potentially sentient AIs seems deeply inadequate, much like it would be for protecting children against abuse. Instead, I believe that establishing clear, enforceable laws is essential for ethically managing artificial sentience.

That said, I'm skeptical that a moratorium is the best policy.

From a classical utilitarian perspective, the imposition of a lengthy moratorium on the development of sentient AI seems like it would help to foster a more conservative global culture—one that is averse towards not only creating sentient AI, but also potentially towards other forms of life-expanding ventures, such as space colonization. Classical utilitarianism is typically seen as aiming to maximize the number of conscious beings in existence, advocating for actions that enable the flourishing and expansion of life, happiness, and fulfillment on as broad a scale as possible. However, implementing and sustaining a lengthy ban on AI would likely require substantial cultural and institutional shifts away from these permissive and ambitious values.

To enforce a moratorium of this nature, societies would likely adopt a framework centered around caution, restriction, and a deep-seated aversion to risk—values that would contrast sharply with those that encourage creating sentient life and proliferating this life on as large of a scale as possible. Maintaining a strict stance on AI development might lead governments, educational institutions, and media to promote narratives emphasizing the potential dangers of sentience and AI experimentation, instilling an atmosphere of risk-aversion rather than curiosity, openness, and progress. Over time, these narratives could lead to a culture less inclined to support or value efforts to expand sentient life.

Even if the ban is at some point lifted, there's no guarantee that the conservative attitudes generated under the ban would entirely disappear, or that all relevant restrictions on artificial life would completely go away. Instead, it seems more likely that many of these risk-averse attitudes would remain even after the ban is formally lifted, given the initially long duration of the ban, and the type of culture the ban would inculcate.

In my view, this type of cultural conservatism seems likely to, in the long run, undermine the core aims of classical utilitarianism. A shift toward a society that is fearful or resistant to creating new forms of life may restrict humanity’s potential to realize a future that is not only technologically advanced but also rich in conscious, joyful beings. If we accept the idea of 'value lock-in'—the notion that the values and institutions we establish now may set a trajectory that lasts for billions of years—then cultivating a culture that emphasizes restriction and caution may have long-term effects that are difficult to reverse. Such a locked-in value system could close off paths to outcomes that are aligned with maximizing the proliferation of happy, meaningful lives.

Thus, if a moratorium on sentient AI were to shape society's cultural values in a way that leans toward caution and restriction, I think the enduring impact would likely contradict classical utilitarianism's ultimate goal: the maximal promotion and flourishing of sentient life. Rather than advancing a world with greater life, joy, and meaningful experiences, these shifts might result in a more closed-off, limited society, actively impeding efforts to create a future rich with diverse and conscious life forms.

(Note that I have talked mainly about these concerns from a classical utilitarian point of view. However, I concede that a negative utilitarian or antinatalist would find it much easier to rationally justify a long moratorium on AI.

It is also important to note that my conclusion holds even if one does not accept the idea of a 'value lock-in'. In that case, longtermists should likely focus on the near-term impacts of their decisions, as the long-term impacts of their actions may be impossible to predict. And I'd argue that a moratorium would likely have a variety of harmful near-term effects.)

Load more