the original statement still just seems to imagine that norms will be a non-trivial reason to avoid theft, which seems quite unlikely for a moderately rational agent.
Sorry, I think you're still conflating two different concepts. I am not claiming:
I am claiming:
If the scenario were such that any one AI agent can expect to get away with defecting (expropriation from older agents) and the norm-breaking requires passing a non-small threshold of such actions
This isn't the scenario I intended to describe, since it seems very unlikely that a single agent could get away with mass expropriation. The more likely scenario is that any expropriation that occurs must have been a collective action to begin with, and thus, there's no problem of coordination that you describe.
This is common in ordinary expropriation in the re...
My guess is that at some point someone will just solve the technical problem of alignment. Thus, future generations of AIs would be actually aligned to prior generations and the group they are aligned to would no longer need to worry about expropriation.
I don't think it's realistic that solutions to the alignment problem will be binary in the way you're describing. One could theoretically imagine a perfect solution — i.e. one that allows you to build an agent whose values never drift, that acts well on every possible input it could receive, whose preferenc...
Perhaps you think this view is worth dismising because either:
- You think humanity wouldn't do things which are better than what AIs would do, so it's unimportant. (E.g. because humanity is 99.9% selfish. I'm skeptical, I think this is going to be more like 50% selfish and the naive billionare extrapolation is more like 90% selfish.)
From an impartial (non-selfish) perspective, yes, I'm not particularly attached to human economic consumption relative to AI economic consumption. In general, my utilitarian intuitions are such that I don't have a strong preferen...
It could be that the AI can achieve much more of their objectives if it takes over (violently or non-violently) than it can achieve by playing by the rules.
Sure, that could be true, but I don't see why it would be true. In the human world, it isn't true that you can usually get what you want more easily by force. For example, the United States seems better off trading with small nations for their resources than attempting to invade and occupy them, even from a self-interested perspective.
More generally, war is costly, even between entities with very dif...
Animals are not socially integrated in society, and we do not share a common legal system or culture with them. We did not inherit legal traditions from them. Nor can we agree to mutual contracts, or coordinate with them in a meaningful way. These differences seem sufficient to explain why we treat them very differently as you described.
If this difference in treatment was solely due to differences in power, you'd need to explain why vulnerable humans are not regularly expropriated, such as old retired folks, or small nations.
For my part, I define “alignment” as “the AI is trying to do things that the AGI designer had intended for it to be trying to do, as an end in itself and not just as a means-to-an-end towards some different goal that it really cares about.”
This is a reasonable definition, but it's important to note that under this definition of alignment, humans are routinely misaligned with each other. In almost any interaction I have with strangers -- for example, when buying a meal at a restaurant -- we are performing acts for each other because of mutually beneficia...
Is there a particular part of my post that you disagree with? Or do you think the post is misleading. If so, how?
I think there are a lot of ways AI could go wrong, and "AIs dominating humans like how humans dominate animals" does not exhaust the scope of potential issues.
I really don’t get the “simplicity” arguments for fanatical maximising behaviour. When you consider subgoals, it seems that secretly plotting to take over the world will obviously be much more complicated? Do you have any idea how much computing power and subgoals it takes to try and conquer the entire planet?
I think this is underspecified because
This seems like an isolated demand for rigor to me. I think it's fine to say something is "no evidence" when, speaking pedantically, it's only a negligible amount of evidence.
I think that's fair, but I'm still admittedly annoyed at this usage of language. I don't think it's an isolated demand for rigor because I have personally criticized many other similar uses of "no evidence" in the past.
I think future AIs will be much more aligned than humans, because we will have dramatically more control over them than over humans.
That's plausible to me, but I...
(I might write a longer response later, but I thought it would be worth writing a quick response now.)
I have a few points of agreement and a few points of disagreement:
Agreements:
Superhuman agents ruthlessly optimize for a reward at the expense of anything else we might care about. The more capable the agent and the more ruthless the optimizer, the more extreme the results.
To the extent this is an empirical claim about superhuman agents we are likely to build and not merely a definition, it needs to be argued for, not merely assumed. "Ruthless" optimization could indeed be bad for us, but current AIs don't seem well-described as ruthless optimizers.
Instead, LLMs appear corrigible more-or-less by default, and there don't appear t...
Some people seem to think the risk from AI comes from AIs gaining dangerous capabilities, like situational awareness. I don't really agree. I view the main risk as simply arising from the fact that AIs will be increasingly integrated into our world, diminishing human control.
Under my view, the most important thing is whether AIs will be capable of automating economically valuable tasks, since this will prompt people to adopt AIs widely to automate labor. If AIs have situational awareness, but aren't economically important, that's not as concerning.
The risk...
...Barnett argues that future technology will be primarily used to satisfy economic consumption (aka selfish desires). That seems even plausible to me, however, I’m not that concerned about this causing huge amounts of future suffering (at least compared to other s-risks). It seems to me that most humans place non-trivial value on the welfare of (neutral) others such as animals. Right now, this preference (for most people) isn’t strong enough to outweigh the selfish benefits of eating meat. However, I’m relatively hopeful that future technology would mak
In some circles that I frequent, I've gotten the impression that a decent fraction of existing rhetoric around AI has gotten pretty emotionally charged. And I'm worried about the presence of what I perceive as demagoguery regarding the merits of AI capabilities and AI safety. Out of a desire to avoid calling out specific people or statements, I'll just discuss a hypothetical example for now.
Suppose an EA says, "I'm against OpenAI's strategy for straightforward reasons: OpenAI is selfishly gambling everyone's life in a dark gamble to make themselves immorta...
I think OpenAI doesn't actually advocate a "full-speed ahead approach" in a strong sense. A hypothetical version of OpenAI that advocated a full speed ahead approach would immediately gut its safety and preparedness teams, advocate subsidies for AI, and argue against any and all regulations that might impede their mission.
Now, of course, there might be political reasons why OpenAI doesn't come out and do this. They care about their image, and I'm not claiming we should take all their statements at face value. But another plausible theory is simply that Ope...
I think "if you believe the probability that a technology will make humanity go extinct with a probability of 1% or more, be very very cautious" would be endorsed by a large majority of the general population & intellectual 'elite'.
I'm not sure we disagree. A lot seems to depend on what is meant by "very very cautious". If it means shutting down AI as a field, I'm pretty skeptical. If it means regulating AI, then I agree, but I also think Sam Altman advocates regulation too.
I agree the general population would probably endorse the statement "if a techn...
There's an IMO fairly simple and plausible explanation for why Sam Altman would want to accelerate AI that doesn't require positing massive cognitive biases or dark motives. The explanation is simply: according to his moral views, accelerating AI is a good thing to do.
[ETA: also, presumably, Sam Altman thinks that some level of safety work is good. He just prefers a lower level of safety work/deceleration than a typical EA might recommend.]
It wouldn't be unusual for him to have such a moral view. If one's moral view puts substantial weight on the lives and...
Arguably, it is effective altruists who are the unusual ones here. The standard EA theory employed to justify extreme levels of caution around AI is strong longtermism.
This suggests people's expected x-risk levels are really small ('extreme levels of caution'), which isn't what people believe.
I think "if you believe the probability that a technology will make humanity go extinct with a probability of 1% or more, be very very cautious" would be endorsed by a large majority of the general population & intellectual 'elite'. It's not at all a fringe moral position.
Me being alive is a relatively small part of my values.
I agree some people (such as yourself) might be extremely altruistic, and therefore might not care much about their own life relative to other values they hold, but this position is fairly uncommon. Most people care a lot about their own lives (and especially the lives of their family and friends) relative to other things they care about. We can empirically test this hypothesis by looking at how people choose to spend their time and money; and the results are generally that people spend their money on ...
One intuitive argument for why capitalism should be expected to advance AI faster than competing economic systems is because capitalist institutions incentivize capital accumulation, and AI progress is mainly driven by the accumulation of computer capital.
This is a straightforward argument: traditionally it is widely considered that a core element of capitalist institutions is the ability to own physical capital, and receive income from this ownership. AI progress and AI-driven growth requires physical computer capital, both for training and for infe...
I have the feeling we're talking past each other a bit. I suspect talking about this poll was kind of a distraction. I personally have the sense of trying to convey a central point, and instead of getting the point across, I feel the conversation keeps slipping into talking about how to interpret minor things I said, which I don't see as very relevant.
I will probably take a break from replying for now, for these reasons, although I'd be happy to catch up some time and maybe have a call to discuss these questions in more depth. I definitely see you as trying a lot harder than most other EAs in trying to make progress on these questions collaboratively with me.
This response still seems underspecified to me. Is the default unaligned alternative paperclip maximization in your view? I understand that Eliezer Yudkowsky has given arguments for this position, but it seems like you diverge significantly from Eliezer's general worldview, so I'd still prefer to hear this take spelled out in more detail from your own point of view.
Like you claim there aren't any defensible reasons to think that what humans will do is better than literally maximizing paper clips?
I'm not exactly sure what you mean by this. There were three options, and human paperclippers were only one of these options. I was mainly discussing the choice between (1) and (2) in the comment, not between (1) and (3).
Here's my best guess at what you're saying: it sounds like you're repeating that you expect humans to be unusually altruistic or thoughtful compared to an unaligned alternative. But the point of my previou...
When I say that people are partial to humanity, I'm including an irrational bias towards thinking that humans, or evolved beings, are unusually thoughtful or ethical compared to the alternatives (I believe this is in fact an irrational bias, since the arguments I've seen for thinking that unaligned AIs will be less thoughtful or ethical than aliens seem very weak to me).
In other cases, when people irrationally hold a certain group X to a higher standard than a group Y, it is routinely described as "being partial to group Y over group X". I think this is ju...
This seems to underrate the arguments for Malthusian competition in the long run.
I'm mostly talking about what I expect to happen in the short-run in this thread. But I appreciate these arguments (and agree with most of them).
Plausibly my main disagreement with the concerns you raised is that I think coordination is maybe not very hard. Coordination seems to have gotten stronger over time, in the long-run. AI could also potentially make coordination much easier. As Bostrom has pointed out, historical trends point towards the creation of a Singleton.
I'm ...
The confusing thing about that is, what if EA activities are a key reason why good countermeasures end up being taken against AI?
I find that quite unlikely. I think EA activities contribute on the margin, but it seems very likely to me that people would eventually have taken measures against AI risk in the absence of any EA movement.
In general, while I agree we should not take this argument so far, so that EA ideas do not become "victims of their own success", I also think neglectedness is a standard barometer EAs have used to judge the merits of their int...
I think the fact that people are partial to humanity explains a large fraction of the disagreement people have with me. But, fair enough, I exaggerated a bit. My true belief is a more moderate version of that claim.
When discussing why EAs in particular disagree with me, to overgeneralize by a fair bit, I've noticed that EAs are happy to concede that AIs could be moral patients, but are generally reluctant to admit AIs as moral agents, in the way they'd be happy to accept humans as independent moral agents (e.g. newborns) into our society. I'd call this "be...
I think the fact that people are partial to humanity explains a large fraction of the disagreement people have with me.
Maybe, it's hard for me to know. But I predict most the pushback you're getting from relatively thoughtful longtermists isn't due to this.
I've noticed that EAs are happy to concede that AIs could be moral patients, but are generally reluctant to admit AIs as moral agents, in the way they'd be happy to accept humans as independent moral agents (e.g. newborns) into our society.
I agree with this.
...I'd call this "being partial to humanity"
I don't think humanity is bad. I just think people are selfish, and generally driven by motives that look very different from impartial total utilitarianism. AIs (even potentially "random" ones) seem about as good in expectation, from an impartial standpoint. In my opinion, this view becomes even stronger if you recognize that AIs will be selected on the basis of how helpful, kind, and useful they are to users. (Perhaps notice how different this selection criteria is from the evolutionary criteria used to evolve humans.)
I understand that most people are pa...
And there is distinction I haven't seen you acknowledged: while high "quality" doesn't require humans to be around, I ultimately judge quality by my values.
Is there any particular reason why you are partial towards humans generically controlling the future, relative to this particular current generation of humans? To me, it seems like being partial to one's own values, one's community, and especially one's own life, generally leads to an even stronger argument for accelerationism, since the best way to advance your own values is generally to actually "be t...
If instead you believed the latter, that would set a significantly higher bar for unaligned AI, right?
That's right, if I thought human values would improve greatly in the face of enormous wealth and advanced technology, I'd definitely be open to seeing humans as special and extra valuable from a total utilitarian perspective. Note that many routes through which values could improve in the future could apply to unaligned AIs too. So, for example, I'd need to believe that humans would be more likely to reflect, and be more likely to do the right type of r...
I'm guessing preference utilitarians would typically say that only the preferences of conscious entities matter.
Perhaps. I don't know what most preference utilitarians believe.
I doubt any of them would care about satisfying an electron's "preference" to be near protons rather than ionized.
Are you familiar with Brian Tomasik? (He's written about suffering of fundamental particles, and also defended preference utilitarianism.)
I think Bostrom's argument merely compares a pure x-risk (such as a huge asteroid hurtling towards Earth) relative to technological acceleration, and then concludes that reducing the probability of a pure x-risk is more important because the x-risk threatens the eventual colonization of the universe. I agree with this argument in the case of a pure x-risk, but as I noted in my original comment, I don't think that AI risk is a pure x-risk.
If, by contrast, all we're doing by doing AI safety research is influencing something like "the values of the agents in ...
I agree it's important to talk about and analyze the (relatively small) component of human values that are altruistic. I mostly just think this component is already over-emphasized.
Here's one guess at what I think you might be missing about my argument: 90% selfish values + 10% altruistic values isn't the same thing as, e.g., 90% valueless stuff + 10% utopia. The 90% selfish component can have negative effects on welfare from a total utilitarian perspective, that aren't necessarily outweighed by the 10%.
90% selfish values is the type of thing that pr...
The idea that billionaires have 90% selfish values seems consistent with a claim of having "primarily selfish" values in my opinion. Can you clarify what you're objecting to here?
I agree your original argument was slightly different than the form I stated. I was speaking too loosely, and conflated what I thought Pablo might be thinking with what you stated originally.
I think the important claim from my comment is "As far as I can tell, I haven't seen any argument in this thread that analyzed and compared the long-term effects in any detail, except perhaps in Ryan Greenblatt original comment, in which he linked to some other comments about a similar topic in a different thread (but I still don't see what the exact argument is)."
I was just claiming that the "indirect" effects dominate (by indirect, I just mean effects other than shifting the future closer in time).
I understand that. I wanted to know why you thought that. I'm asking for clarity. I don't currently understand your reasons. See this recent comment of mine for more info.
I was trying to hint at prima facie plausible ways in which the present generation can increase the value of the long-term future by more than one part in billions, rather than “assume” that this is the case, though of course I never gave anything resembling a rigorous argument.
As I understand, the argument originally given was that there was a tiny effect of pushing for AI acceleration, which seems outweighed by unnamed and gigantic "indirect" effects in the long-run from alternative strategies of improving the long-run future. I responded by trying to ge...
My stance is that we (more-or-less) know humans are conscious and have moral values that, while they have failed to prevent large amounts of harm, seem to have the potential to be good.
I claim there's a weird asymmetry here where you're happy to put trust into humans because they have the "potential" to do good, but you're not willing to say the same for AIs, even though they seem to have the same type of "potential".
Whatever your expectations about AIs, we already know that humans are not blank slates that may or may not be altruistic in the future: we ac...
It seems like you're just substantially more pessimistic than I am about humans. I think factory farming will be ended, and though it seems like humans have caused more suffering than happiness so far, I think their default trajectory will be to eventually stop doing that, and to ultimately do enough good to outweigh their ignoble past. I don't think this is certain by any means, but I think it's a reasonable extrapolation. (I maybe don't expect you to find it a reasonable extrapolation.)
Meanwhile I expect the typical unaligned AI may seize power for some ...
I basically buy that the values we get will be similar to just giving existing humans massive amounts of wealth, but I'm less sold that this will result in outcomes which are well described as "primarily selfish".
Current humans definitely seem primarily selfish (although I think they also care about their family and friends too; I'm including that). Can you explain why you think giving humans a lot of wealth would turn them into something that isn't primarily selfish? What's the empirical evidence for that idea?
I'm claiming that it is not actually clear that we can take actions that don't merely wash out over the long-term. In this case, you cannot simply assume that we can meaningfully and predictably affect how valuable the long-term future will be in, for example, billions of years. I agree that, yes, if you assume we can meaningfully affect the very long-run, then all actions that merely have short-term effects will have "tiny" impacts by comparison. But the assumption that we can meaningfully and predictably affect the long-run is precisely the thing that ne...
It seems to me that a big crux about the value of AI alignment work is what target you think AIs will ultimately be aligned to in the future in the optimistic scenario where we solve all the "core" AI risk problems to the extent they can be feasibly solved, e.g. technical AI safety problems, coordination problems, the problem of having "good" AI developers in charge etc.
There are a few targets that I've seen people predict AIs will be aligned to if we solve these problems: (1) "human values", (2) benevolent moral values, (3) the values of AI developers, (4...
Under purely longtermist views, accelerating AI by 1 year increases available cosmic resources by 1 part in 10 billion. This is tiny.
Tiny compared to what? Are you assuming we can take some other action whose consequences don't wash out over the long-term, e.g. because of a value lock-in? In general, these assumptions just seem quite weak and underspecified to me.
What exactly is the alternative action that has vastly greater value in expectation, and why does it have greater value? If what you mean is that we can try to reduce the risk of extinction ins...
It would be surprising to me if making the transfer of power more voluntary/careful led to worse outcomes (or only led to slightly better outcomes such that the downsides of slowing down a bit made things worse).
Two questions here:
It's very likely that whatever change that comes from AI development will be irreversible.
I think all actions are in a sense irreversible, but large changes tend to be less reversible than small changes. In this sense, the argument you gave seems reducible to "we should generally delay large changes to the world, to preserve option value". Is that a reasonable summary?
In this case I think it's just not obvious that delaying large changes is good. Would it have been good to delay the industrial revolution to preserve option value? I think this heuristic, if used in the past, would have generally demanded that we "pause" all sorts of social, material, and moral progress, which seems wrong.
I'm curious why there hasn't been more work exploring a pro-AI or pro-AI-acceleration position from an effective altruist perspective. Some points:
AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can't simply apply a naive argument that AI threatens total extinction of value
Paul Christiano wrote a piece a few years ago about ensuring that misaligned ASI is a “good successor” (in the moral value sense),[1] as a plan B to alignment (Medium version; LW version). I agree it’s odd that there hasn’t been more discussion since.[2]
...Here's a non-exhaustive list of guesses for why I think EAs haven't historically been sympathetic [.
Under purely longtermist views, accelerating AI by 1 year increases available cosmic resources by 1 part in 10 billion. This is tiny. So the first order effects of acceleration are tiny from a longtermist perspective.
Thus, a purely longtermist perspective doesn't care about the direct effects of delay/acceleration and the question would come down to indirect effects.
I can see indirect effects going either way, but delay seems better on current margins (this might depend on how much optimism you have on current AI safety progress, governance/policy progress...
I think a more important reason is the additional value of the information and the option value. It's very likely that the change resulting from AI development will be irreversible. Since we're still able to learn about AI as we study it, taking additional time to think and plan before training the most powerful AI systems seems to reduce the likelihood of being locked into suboptimal outcomes. Increasing the likelihood of achieving "utopia" rather than landing into "mediocrity" by 2 percent seems far more important than speeding up utopia by 10 years.
In response to human labor being automated, a lot of people support a UBI funded by a tax on capital. I don't think this policy is necessarily unreasonable, but if later the UBI gets extended to AIs, this would be pretty bad for humans, whose only real assets will be capital.
As a result, the unintended consequence of such a policy may be to set a precedent for a massive wealth transfer from humans to AIs. This could be good if you are utilitarian and think the marginal utility of wealth is higher for AIs than humans. But selfishly, it's a big cost.
What reason is there to think that AI will shift the offense-defense balance absurdly towards offense? I admit such a thing is possible, but it doesn't seem like AI is really the issue here. Can you elaborate?
Your argument in objection 1 doesn't the position people who are worried about an absurd offense-defense imbalance.
I'm having trouble parsing this sentence. Can you clarify what you meant?
Additionally: It may be that no agent can take over the world, but that an agent can destroy the world.
What incentive is there to destroy the world, as opposed to take it over? If you destroy the world, aren't you sacrificing yourself at the same time?
Attempting takeover or biding one's time are not the only options an AI may take. Indeed, in the human world, world takeover is rarely contemplated. For an agent that is not more powerful than the rest of the world combined, it seems likely that they will consider alternative strategies of achieving their goals before con... (read more)