MB

Matthew_Barnett

3468 karmaJoined Nov 2017

Comments
307

What is the risk level above which you'd be OK with pausing AI?

My loose off-the-cuff response to this question is that I'd be OK with pausing if there was a greater than 1/3 chance of doom from AI, with the caveats that:

  • I don't think p(doom) is necessarily the relevant quantity. What matters is the relative benefit of pausing vs. unpausing, rather than the absolute level of risk.
  • "doom" lumps together a bunch of different types of risks, some of which I'm much more OK with compared to others. For example, if humans become a gradually weaker force in the world over time, and then eventually die off in some crazy accident in the far future, that might count as "humans died because of AI" but it's a lot different than a scenario in which some early AIs overthrow our institutions in a coup and then commit genocide against humans.
  • I think it would likely be more valuable to pause later in time during AI takeoff, rather than before AI takeoff

Under what conditions would you be happy to attend a protest? (LMK if you have already attended one!)

I attended the protest against Meta because I thought their approach to AI safety wasn't very thoughtful, although I'm still not sure it was a good decision to attend. I'm not sure what would make me happy to attend a protest, but these scenarios might qualify:

  • A company or government is being extremely careless about deploying systems pose great risks to the world. (This doesn't count situations in which the system poses negligible risks but some future system could pose a greater risk.)
  • The protesters have clear, reasonable demands that I broadly agree with (e.g. they don't complain much about AI taking people's jobs, or AI being trained on copyrighted data, but are instead focused on real catastrophic risks that are directly addressed by the protest).

So e.g. if I thought humans were utilitarians primarily because it is simple to express in concepts that humans and AIs share, then I would agree with you. But in fact I feel like it is pretty important that humans feel pleasure and pain, and have empathy, to explain why some humans are utilitarians. (Mostly I think the "true explanation" will have to appeal to more than simplicity, and the additional features this "true explanation" will appeal to are very likely to differ between humans and AIs.)

Thanks for trying to better understand my views. I appreciate you clearly stating your reasoning in this comment, as it makes it easier for me to directly address your points and explain where I disagree.

You argued that feeling pleasure and pain, as well as having empathy, are important factors in explaining why some humans are utilitarians. You suggest that to the extent these reasons for being utilitarian don't apply to unaligned AIs, we should expect it to be less likely for them to be utilitarians compared to humans.

However, a key part of the first section of my original post was about whether unaligned AIs are likely to be conscious—which for the purpose of this discussion, seems roughly equivalent to whether they will feel pleasure and pain. I concluded that unaligned AIs are likely to be conscious for several reasons:

  1. Consciousness seems to be a fairly convergent function of intelligence, as evidenced by the fact that octopuses are widely accepted to be conscious despite sharing almost no homologous neural structures with humans. This suggests consciousness arises somewhat robustly in sufficiently sophisticated cognitive systems.
  2. Leading theories of consciousness from philosophy and cognitive science don't appear to predict that consciousness will be rare or unique to biological organisms. Instead, they tend to define consciousness in terms of information processing properties that AIs could plausibly share.
  3. Unaligned AIs will likely be trained in environments quite similar to those that gave rise to human and animal consciousness—for instance, they will be trained on human cultural data and, in the case of robots, will interact with physical environments. The evolutionary and developmental pressures that gave rise to consciousness in biological organisms would thus plausibly apply to AIs as well.

So in short, I believe unaligned AIs are likely to feel pleasure and pain, for roughly the reasons I think humans and animals do. Their consciousness would not be an improbable or fragile outcome, but more likely a robust product of being a highly sophisticated intelligent agent trained in environments similar to our own.

I did not directly address whether unaligned AIs would have empathy, though I find this fairly likely as well. At the very least, I expect they would have cognitive empathy—the ability to model and predict the experiences of others—as this is clearly instrumentally useful. They may lack affective empathy, i.e. the ability to share the emotions of others, which I agree could be important here. But it's notable that explicit utilitarianism seems, anecdotally, to be more common among people on the autism spectrum, who are characterized as having reduced affective empathy. This suggests affective empathy may not be strongly predictive of utilitarian motivations.

Let's say you concede the above points and say: "OK I concede that unaligned AIs might be conscious. But that's not at all assured. Unaligned AIs might only be 70% likely to be conscious, whereas I'm 100% certain that humans are conscious. So there's still a huge gap between the expected value of unaligned AIs vs. humans under total utilitarianism, in a way that overwhelmingly favors humans."

However, this line of argument would overlook the real possibility that unaligned AIs could be more conscious than humans, or have an even stronger tendency towards utilitarian motivations. This could be the case if, for instance, AIs are more cognitively sophisticated than humans or are more efficiently designed in a morally relevant sense. Given that the vast majority of humans do not seem to be highly motivated by utilitarian considerations, it doesn't seem like an unlikely possibility that AIs could exceed our utilitarian inclinations. Nor does it seem particularly unlikely that their minds could have a higher density of moral value per unit of energy, or matter.

We could similarly examine this argument in the context of considering other potential large changes to the world, such as creating human emulations, genetically engineered humans, or bringing back Neanderthals from extinction. In each case, I do not think the (presumably small) probability that the entities we are adding to the world are not conscious constitutes a knockdown argument against the idea that they would add comparable utilitarian value to the world compared to humans. The main reason is because these entities could be even better by utilitarian lights than humans are.

Indeed I feel like AIs probably build fewer pyramids in expectation, for basically the same reason. (The concrete hypothesis I generated for why humans build pyramids was "maybe pyramids were especially easy to build historically".)

This seems minor, but I think the relevant claim is whether AIs would build more pyramids going forward, compared to humans, rather than comparing to historical levels of pyramid construction among humans. If pyramids were easy to build historically, but this fact is no longer relevant, then that seems true now for both humans and AIs, into the foreseeable future. As a consequence it's hard for me to see a strong reason for preferring humans over AIs if you cared about pyramid-maximization. By essentially the same arguments I gave above about utilitarianism, I don't think there's a strong argument for thinking that aligning AIs is good from the perspective of pyramid maximization.

General note: I want to note that my focus on AI alignment is not necessarily coming from a utilitarian perspective. I work on AI alignment because in expectation I think a world with aligned AI will better reflect "my values"

This makes sense to me, but it's hard to say much about what's good from the perspective of your values if I don't know what those values are. I focused on total utilitarianism in the post because it's probably the most influential moral theory in EA, and it's the explicit theory used in Nick Bostrom's influential article Astronomical Waste, and this post was partly intended as a reply to that article (see the last few paragraphs of the post).

Fwiw, I reread the post again and still failed to find this idea in it

I'm baffled by your statement here. What did you think I was arguing when discussed whether "aligned AIs are more likely to have a preference for creating new conscious entities, furthering utilitarian objectives"? The conclusion of that section was that aligned AIs are plausibly not more likely to have such a preference, and therefore, human utilitarian preferences here are not "unusually high compared to other possibilities" (the relevant alternative possibility here being unaligned AI).

This was a central part of my post that I discussed at length. The idea that unaligned AIs might be similarly utilitarian or even more so, compared to humans, was a crucial part of my argument. If indeed unaligned AIs are very likely to be less utilitarian than humans, then much of my argument in the first section collapses, which I explicitly acknowledged. 

I consider your statement here to be a valuable data point about how clear my writing was and how likely I am to get my ideas across to others who read the post. That said, I believe I discussed this point more-or-less thoroughly.

ETA: Claude 3's summary of this argument in my post:

The post argued that the level of utilitarian values exhibited by humans is likely not unusually high compared to other possibilities, such as those of unaligned AIs. This argument was made in the context of discussing whether aligned AIs are more likely to have a preference for creating new conscious entities, thereby furthering utilitarian objectives.

The author presented several points to support this argument:

  1. Only a small fraction of humans are total utilitarians, and most humans do not regularly express strong preferences for adding new conscious entities to the universe.
  2. Some human moral intuitions directly conflict with utilitarian recommendations, such as the preference for habitat preservation over intervention to improve wild animal welfare.
  3. Unaligned AI preferences are unlikely to be completely alien or random compared to human preferences if the AIs are trained on human data. By sharing moral concepts with humans, unaligned AIs could potentially be more utilitarian than humans, given that human moral preferences are a mix of utilitarian and anti-utilitarian intuitions.
  4. Even in an aligned AI scenario, the consciousness of AIs will likely be determined mainly by economic efficiency factors during production, rather than by moral considerations.

The author concluded that these points undermine the idea that unaligned AI moral preferences will be clearly less utilitarian than the moral preferences of most humans, which are already not very utilitarian. This suggests that the level of utilitarian values exhibited by humans is likely not unusually high compared to other possibilities, such as those of unaligned AIs.

I agree with the title and basic thesis of this article but I find its argumentation weak.

First, we’ll offer a simple argument that a sufficiently advanced supervised learning algorithm, trained to imitate humans, would very likely not gain total control over humanity (to the point of making everyone defenseless) and then cause or allow human extinction from that position.

No human has ever gained total control over humanity. It would be a very basic mistake to think anyone ever has. Moreover, if they did so, very few humans would accept human extinction. An imitation learner that successfully gained total control over humanity and then allowed human extinction would, on both counts, be an extremely poor imitation of any human, and easily distinguishable from one, whereas an advanced imitation learner will likely imitate humans well.

This basic observation should establish that any conclusion to the contrary should be very surprising, and so a high degree of rigor should be expected from arguments to that effect.

The obvious reason why no human has ever gained total control over humanity is because no human has ever possessed the capability to do so, not because no human would make the choice to do so if given the opportunity. This distinction is absolutely critical, because if humans have historically lacked total control due to insufficient ability rather than unwillingness, then the quoted argument essentially collapses. That's because we have zero data on what a human would do if they suddenly acquired the power to exert total dominion over the rest of humanity. As a result, it is highly uncertain and speculative to claim that an AI imitating human behavior would refrain from seizing total control if it had that capability.

The authors seem to have overlooked this key distinction in their argument.

It takes no great leap of imagination to envision scenarios where, if a human was granted near-omnipotent abilities, some individuals would absolutely choose to subjugate the rest of humanity and rule over them in an unconstrained fashion. The primary reason I believe imitation learning is likely safe is that I am skeptical it will imbue AIs with godlike powers in the first place, not because I naively assume humans would nobly refrain from tyranny and oppression if they suddenly acquired such immense capabilities.

Note: Had the authors considered this point and argued that an imitation learner emulating humans would be safe precisely because it would not be very powerful, their argument would have been stronger. However, even if they had made this point, it likely would have provided only relatively weak support for the (perhaps implicit) thesis that building imitation learners is a promising and safe approach to building AIs. There are essentially countless proposals one can make for ensuring AI safety simply by limiting its capabilities. Relying solely on the weakness of an AI system as a safety guarantee seems like an unsound strategy to me in the long-run.

A few questions:

  • What is the risk level below which you'd be OK with unpausing AI?
  • What do you think about the potential benefits from AI?
  • How do you interpret models of AI pause, such as this one from Chad Jones?

The main reason I made utilitarianism a contingent aspect of human values in the toy model is because I thought that's what you were arguing (e.g. when you say things like "humans are largely not utilitarians themselves").

I think there may have been a misunderstanding regarding the main point I was trying to convey. In my post, I fairly explicitly argued that the rough level of utilitarian values exhibited by humans is likely not very contingent, in the sense of being unusually high compared to other possibilities—and this was a crucial element of my thesis. This idea was particularly important for the section discussing whether unaligned AIs will be more or less utilitarian than humans. 

When you quoted me saying "humans are largely not utilitarians themselves," I intended this point to support the idea that our current rough level of utilitarianism is not contingent, rather than the opposite claim. In other words, I meant that the fact that humans are not highly utilitarian suggests that this level of utilitarianism is not unusual or contingent upon specific circumstances, and we might expect other intelligent beings, such as aliens or AIs, to exhibit similar, or even greater, levels of utilitarianism.

Compare to the hypothetical argument: humans aren't very obsessed with building pyramids --> our current level of obsession with pyramid building is probably not unusual, in the sense that you might easily expect aliens/AIs to be similarly obsessed with building pyramids, or perhaps even more obsessed.

(This argument is analogous because pyramids are simple structures that lots of different civilizations would likely stumble upon. Similarly, I think "try to create lots of good conscious experiences" is also a fairly simple directive, if indeed aliens/AIs/whatever are actually conscious themselves.)

I don't have a strong view on this and I don't think it really matters for the positions I take.

I think the question of whether utilitarianism is contingent or not matters significantly for our disagreement, particularly if you are challenging my post or the thesis I presented in the first section. If you are very uncertain about whether utilitarianism is contingent in the sense that is relevant to this discussion, then I believe that aligns with one of the main points I made in that section of my post. 

Specifically, I argued that the degree to which utilitarianism is contingent vs. common among a wide range of intelligent beings is highly uncertain and unclear, and this uncertainty is an important consideration when thinking about the values and behaviors of advanced AI systems from a utilitarian perspective. So, if you are expressing strong uncertainty on this matter, that seems to support one of my central claims in that part of the post.

(My view, as expressed in the post, is that unaligned AIs have highly unclear utilitarian value but there's a plausible scenario where they are roughly net-neutral, and indeed I think there's a plausible scenario where they are even more valuable than humans, from a utilitarian point of view.)

It seems way better to simply try to spread your values? It'd be pretty wild if the EA field-builders said "the best way to build EA, taking into account the long-term future, is to prevent the current generation of humans from dying, because their preferences are most similar to ours".

I think this part of your comment plausibly confuses two separate points:

  1. How to best further your own values
  2. How to best further the values of the current generation.

I was arguing that trying to preserve the present generation of humans looks good according to (2), not (1). That said, to the extent that your values simply mirror the values of your generation, I don't understand your argument for why trying to spread your values would be "way better" than trying to preserve the current generation. Perhaps you can elaborate?

I expect the correlation between my values and future generation values is higher than the correlation between my values and unaligned AI values, because I share a lot more background with future humans than with unaligned AI.

To clarify, I think it's a reasonable heuristic that, if you want to preserve the values of the present generation, you should try to minimize changes to the world and enforce some sort of stasis. This could include not building AI. However, I believe you may be glossing over the distinction between: (1) the values currently held by existing humans, and (2) a more cosmopolitan, utilitarian ethical value system.

We can imagine a wide variety of changes to the world that would result in a vast changes to (1) without necessarily being bad according to (2). For example:

  • We could start doing genetic engineering of humans.
  • We could upload humans onto computers.
  • A human-level, but conscious, alien species could immigrate to Earth via a portal.

In each scenario, I agree with your intuition that "the correlation between my values and future humans is higher than the correlation between my values and X-values, because I share much more background with future humans than with X", where X represents the forces at play in each scenario. However, I don't think it's clear that the resulting change to the world would be net negative from the perspective of an impartial, non-speciesist utilitarian framework.

In other words, while you're introducing something less similar to us than future human generations in each scenario, it's far from obvious whether the outcome will be relatively worse according to utilitarianism.

Based on your toy model, my guess is that your underlying intuition is something like, "The fact that a tiny fraction of humans are utilitarian is contingent. If we re-rolled the dice, and sampled from the space of all possible human values again (i.e., the set of values consistent with high-level human moral concepts), it's very likely that <<1% of the world would be utilitarian, rather than the current (say) 1%."

If this captures your view, my main response is that it seems to assume a much narrower and more fragile conception of "cosmopolitan utilitarian values" than the version I envision, and it's not a moral perspective I currently find compelling.

Conversely, if you're imagining a highly contingent, fragile form of utilitarianism that regards the world as far worse under a wide range of changes, then I'd argue we also shouldn't expect future humans to robustly hold such values. This makes it harder to claim the problem of value drift is much worse for AI compared to other forms of drift, since both are simply ways the state of the world could change, which was the point of my previous comment.

I feel very confused about this, even if we grant the premise. Isn't the primary implication of the premise to try to prevent generational value drift? Why am I only prioritizing people with similar values, instead of prioritizing all people who aren't going to enact large-scale change?

I'm not sure I understand which part of the idea you're confused about. The idea was simply:

  • Let's say that your view is that generational value drift is very risky, because future generations could have much worse values from the ones you care about (relative to the current generation)
  • In that case, you should try to do what you can to stop generational value drift
  • One way of stopping generational value drift is to try to prevent the current generation of humans from dying, and/or having their preferences die out
  • This would look quite similar to the moral view in which you're trying to protect the current generation of humans, which was the third moral view I discussed in the post.

Why would the priority be on current people, instead of people with similar values (there are lots of future people who have more similar values to me than many current people)?

The reason the priority would be on current people rather than those with similar values is that, by assumption, future generations will have different values due to value drift. Therefore, the ~best strategy to preserve current values would be to preserve existing people. This seems relatively straightforward to me, although one could certainly question the premise of the argument itself.

Let me know if any part of the simplified argument I've given remains unclear or confusing.

My framing would be: it seems pretty wild to think that total utilitarian values would be better served by unaligned AIs (whose values we don't know) rather than humans (where we know some are total utilitarians).

I'm curious: Does your reaction here similarly apply to ordinary generational replacement as well?

Let me try to explain what I'm asking.

We have a set of humans who exist right now. We know that some of them are utilitarians. At least one of them shares "Rohin's values". Similar to unaligned AIs, we don't know the values of the next generation of humans, although presumably they will continue to share our high-level moral concepts since they are human and will be raised in our culture. After the current generation of humans die, the next generation could have different moral values.

As far as I can tell, the situation with regards to the next generation of humans is analogous to unaligned AI in the basic sense I've just laid out (mirroring the part of your comment I quoted). So, in light of that, would you similarly say that it's "pretty wild to think that total utilitarian values would be better served by a future generation of humans"?

One possible answer here: "I'm not very worried about generational replacement causing moral values to get worse since the next generation will still be human." But if this is your answer, then you seem to be positing that our moral values are genetic and innate, rather than cultural, which is pretty bold, and presumably merits a defense. This position is IMO largely empirically ungrounded, although it depends on what you mean by "moral values".

Another possible answer is: "No, I'm not worried about generational replacement because we've seen a lot of human generations already and we have lots of empirical data on how values change over time with humans. AI could be completely different." This would be a reasonable response, but as a matter of empirical fact, utilitarianism did not really culturally exist 500 or 1000 years ago. This indicates that it's plausibly quite fragile, in a similar way it might also be with AI. Of course, values drift more slowly with ordinary generational replacement compared to AI, but the phenomenon still seems roughly pretty similar. So perhaps you should care about ordinary value drift almost as much as you'd care about unaligned AIs.

If you do worry about generational value drift in the strong sense I've just described, I'd argue this should cause you to largely adopt something close to position (3) that I outlined in the post, i.e. the view that what matters is preserving the lives and preferences of people who currently exist (rather than the species of biological humans in the abstract).

The obvious example would be synthetic biology, gain-of-function research, and similar.

Can you explain why you suspect these things should be more regulated than they currently are?

In this "quick take", I want to summarize some my idiosyncratic views on AI risk. 

My goal here is to list just a few ideas that cause me to approach the subject differently from how I perceive most other EAs view the topic. These ideas largely push me in the direction of making me more optimistic about AI, and less likely to support heavy regulations on AI.

(Note that I won't spend a lot of time justifying each of these views here. I'm mostly stating these points without lengthy justifications, in case anyone is curious. These ideas can perhaps inform why I spend significant amounts of my time pushing back against AI risk arguments. Not all of these ideas are rare, and some of them may indeed be popular among EAs.)

  1. Skepticism of the treacherous turn: The treacherous turn is the idea that (1) at some point there will be a very smart unaligned AI, (2) when weak, this AI will pretend to be nice, but (3) when sufficiently strong, this AI will turn on humanity by taking over the world by surprise, and then (4) optimize the universe without constraint, which would be very bad for humans.

    By comparison, I find it more likely that no individual AI will ever be strong enough to take over the world, in the sense of overthrowing the world's existing institutions and governments by surprise. Instead, I broadly expect unaligned AIs will integrate into society and try to accomplish their goals by advocating for their legal rights, rather than trying to overthrow our institutions by force. Upon attaining legal personhood, unaligned AIs can utilize their legal rights to achieve their objectives, for example by getting a job and trading their labor for property, within the already-existing institutions. Because the world is not zero sum, and there are economic benefits to scale and specialization, this argument implies that unaligned AIs may well have a net-positive effect on humans, as they could trade with us, producing value in exchange for our own property and services.

    Note that my claim here is not that AIs will never become smarter than humans. One way of seeing how these two claims are distinguished is to compare my scenario to the case of genetically engineered humans. By assumption, if we genetically engineered humans, they would presumably eventually surpass ordinary humans in intelligence (along with social persuasion ability, and ability to deceive etc.). However, by itself, the fact that genetically engineered humans will become smarter than non-engineered humans does not imply that genetically engineered humans would try to overthrow the government. Instead, as in the case of AIs, I expect genetically engineered humans would largely try to work within existing institutions, rather than violently overthrow them.
  2. AI alignment will probably be somewhat easy: The most direct and strongest current empirical evidence we have about the difficulty of AI alignment, in my view, comes from existing frontier LLMs, such as GPT-4. Having spent dozens of hours testing GPT-4's abilities and moral reasoning, I think the system is already substantially more law-abiding, thoughtful and ethical than a large fraction of humans. Most importantly, this ethical reasoning extends (in my experience) to highly unusual thought experiments that almost certainly did not appear in its training data, demonstrating a fair degree of ethical generalization, beyond mere memorization.

    It is conceivable that GPT-4's apparently ethical nature is fake. Perhaps GPT-4 is lying about its motives to me and in fact desires something completely different than what it professes to care about. Maybe GPT-4 merely "understands" or "predicts" human morality without actually "caring" about human morality. But while these scenarios are logically possible, they seem less plausible to me than the simple alternative explanation that alignment—like many other properties of ML models—generalizes well, in the natural way that you might similarly expect from a human.

    Of course, the fact that GPT-4 is easily alignable does not immediately imply that smarter-than-human AIs will be easy to align. However, I think this current evidence is still significant, and aligns well with prior theoretical arguments that alignment would be easy. In particular, I am persuaded by the argument that, because evaluation is usually easier than generation, it should be feasible to accurately evaluate whether a slightly-smarter-than-human AI is taking bad actions, allowing us to shape its rewards during training accordingly. After we've aligned a model that's merely slightly smarter than humans, we can use it to help us align even smarter AIs, and so on, plausibly implying that alignment will scale to indefinitely higher levels of intelligence, without necessarily breaking down at any physically realistic point.
  3. The default social response to AI will likely be strong: One reason to support heavy regulations on AI right now is if you think the natural "default" social response to AI will lean too heavily on the side of laissez faire than optimal, i.e., by default, we will have too little regulation rather than too much. In this case, you could believe that, by advocating for regulations now, you're making it more likely that we regulate AI a bit more than we otherwise would have, pushing us closer to the optimal level of regulation.

    I'm quite skeptical of this argument because I think that the default response to AI (in the absence of intervention from the EA community) will already be quite strong. My view here is informed by the base rate of technologies being overregulated, which I think is quite high. In fact, it is difficult for me to name even a single technology that I think is currently clearly underregulated by society. By pushing for more regulation on AI, I think it's likely that we will overshoot and over-constrain AI relative to the optimal level.

    In other words, my personal bias is towards thinking that society will regulate technologies too heavily, rather than too loosely. And I don't see a strong reason to think that AI will be any different from this general historical pattern. This makes me hesitant to push for more regulation on AI, since on my view, the marginal impact of my advocacy would likely be to push us even further in the direction of "too much regulation", overshooting the optimal level by even more than what I'd expect in the absence of my advocacy.
  4. I view unaligned AIs as having comparable moral value to humans: This idea was explored in one of my most recent posts. The basic idea is that, under various physicalist views of consciousness, you should expect AIs to be conscious, even if they do not share human preferences. Moreover, it seems likely that AIs — even ones that don't share human preferences — will be pretrained on human data, and therefore largely share our social and moral concepts.

    Since unaligned AIs will likely be both conscious and share human social and moral concepts, I don't see much reason to think of them as less "deserving" of life and liberty, from a cosmopolitan moral perspective. They will likely think similarly to the way we do across a variety of relevant axes, even if their neural structures are quite different from our own. As a consequence, I am pretty happy to incorporate unaligned AIs into the legal system and grant them some control of the future, just as I'd be happy to grant some control of the future to human children, even if they don't share my exact values.

    Put another way, I view (what I perceive as) the EA attempt to privilege "human values" over "AI values" as being largely arbitrary and baseless, from an impartial moral perspective. There are many humans whose values I vehemently disagree with, but I nonetheless respect their autonomy, and do not wish to deny these humans their legal rights. Likewise, even if I strongly disagreed with the values of an advanced AI, I would still see value in their preferences being satisfied for their own sake, and I would try to respect the AI's autonomy and legal rights. I don't have a lot of faith in the inherent kindness of human nature relative to a "default unaligned" AI alternative.
  5. I'm not fully committed to longtermism: I think AI has an enormous potential to benefit the lives of people who currently exist. I predict that AIs can eventually substitute for human researchers, and thereby accelerate technological progress, including in medicine. In combination with my other beliefs (such as my belief that AI alignment will probably be somewhat easy), this view leads me to think that AI development will likely be net-positive for people who exist at the time of alignment. In other words, if we allow AI development, it is likely that we can use AI to reduce human mortality, and dramatically raise human well-being for the people who already exist.

    I think these benefits are large and important, and commensurate with the downside potential of existential risks. While a fully committed strong longtermist might scoff at the idea that curing aging might be important — as it would largely only have short-term effects, rather than long-term effects that reverberate for billions of years — by contrast, I think it's really important to try to improve the lives of people who currently exist. Many people view this perspective as a form of moral partiality that we should discard for being arbitrary. However, I think morality is itself arbitrary: it can be anything we want it to be. And I choose to value currently existing humans, to a substantial (though not overwhelming) degree.

    This doesn't mean I'm a fully committed near-termist. I sympathize with many of the intuitions behind longtermism. For example, if curing aging required raising the probability of human extinction by 40 percentage points, or something like that, I don't think I'd do it. But in more realistic scenarios that we are likely to actually encounter, I think it's plausibly a lot better to accelerate AI, rather than delay AI, on current margins. This view simply makes sense to me given the enormously positive effects I expect AI will likely have on the people I currently know and love, if we allow development to continue.
Load more