3602 karmaJoined


Again, I'm assuming that the AIs won't get this money. Almost everything AIs do basically gets done for "free", in an efficient market, without AIs themselves earning money. This is similar to how most automation works. 

That's not what I meant. I expect the human labor share to decline to near-zero levels even if AIs don't own their own labor.

In the case AIs are owned by humans, their wages will accrue to their owners, who will be humans. In this case, aggregate human wages will likely be small relative to aggregate capital income (i.e., GDP that is paid to capital owners, including people who own AIs).

In the case AIs own their own labor, I expect aggregate human wages will be both small compared to aggregate AI wages, and small compared to aggregate capital income.

In both cases, I expect the total share of GDP paid out as human wages will be small. (Which is not to say humans will be doing poorly. You can enjoy high living standards even without high wages: rich retirees do that all the time.)

I think that even small bottlecks would eventually become a large deal. If 0.1% of a process is done by humans, but the rest gets automated and done for ~free, then that 0.1% is what gets paid for.

I agree with this in theory, but in practice I expect these bottlenecks to be quite insignificant in both the short and long-run.

We can compare to an analogous case in which we open up the labor market to foreigners (i.e., allowing them to immigrate into our country). In theory, preferences for services produced by natives could end up implying that, no matter how many people immigrate to our country, natives will always command the majority of aggregate wages. However, in practice, I expect that the native labor share of income would decline almost in proportion to their share of the total population.

In the immigration analogy, the reason why native workers would see their aggregate share of wages decline is essentially the same as the reason why I expect the human labor share to decline with AI: foreigners, like AIs, can learn to do our jobs about as well as we can do them. In general, it is quite rare for people to have strong preferences about who produces the goods and services they buy, relative to their preferences about the functional traits of those goods and services (such as their physical quality and design). 

(However, the analogy is imperfect, of course, because immigrants tend to be both consumers and producers, and therefore their preferences impact the market too -- whereas you might think AIs will purely be producers, with no consumption preferences.)

Quickly - "absent consumer preferences for human-specific services, or regulations barring AIs from doing certain tasks—AIs will be ~perfectly substitutable for human labor." -> This part is doing a lot of work. Functionally, I expect these to be a very large deal for a while. 

Perhaps you can expand on this point. I personally don't think there are many economic services for which I would strongly prefer a human perform them compared to a functionally identical service produced by an AI. I have a hard time imagining paying >50% of my income on human-specific services if I could spend less money to obtain essentially identical services from AIs, and thereby greatly expand my consumption as a result.

However, if we are counting the value of interpersonal relationships (which are not usually counted in economic statistics), then I agree the claim is more plausible. Nonetheless, this also seems somewhat unimportant when talking about things like whether humans would win a war with AIs.

> AIs would collectively have far more economic power than humans.
I mean, only if we treat them as individuals with their own property rights. 

In this context, it doesn't matter that much whether AIs have legal property rights, since I was talking about whether AIs will collectively be more productive and powerful than humans. This distinction is important because, if there is a war between humans and AIs, I expect their actual productive abilities to be more important than their legal share of income on paper, in determining who wins the war.

But I agree that, if humans retain their property rights, then they will likely be economically more powerful than AIs in the foreseeable future by virtue of their ownership over capital (which could include both AIs and more traditional forms of physical capital).

Here are four relevant analogies which I use to model how cognitive labor might respond to AI progress.

I think none of these analogies are very good because they fail to capture what I see as the key difference between AI and previous technologies. In short, unlike the printing press or mechanized farming, I think AI will eventually be capable of substituting for humans in virtually any labor task (both existing and potential future tasks), in a functional sense.

This dramatically raises the potential for both economic growth and effects on wages, since it effectively means that—absent consumer preferences for human-specific services, or regulations barring AIs from doing certain tasks—AIs will be ~perfectly substitutable for human labor. In a standard growth model, this would imply that the share of GDP paid to human labor will fall to near zero as the AI labor force scales. In that case, owners of capital would become extremely rich, and AIs would collectively have far more economic power than humans. This could be very bad for humans if there is ever a war between the humans and the AIs. 

I think the correct way to analyze how AI will affect cognitive labor is inside of an appropriate mathematical model, such as this one provided by Anton Korinek and Donghyun Suh. Analogies to prior technologies, by contrast, seem likely to mislead people into thinking that there's "nothing new under the sun" with AI.

A separate question here is why we should care about whether AIs possess "real" understanding, if they are functionally very useful and generally competent. If we can create extremely useful AIs that automate labor on a giant scale, but are existentially safe by virtue of their lack of real understanding of the world, then we should just do that?

Persuasion alone — even via writing publicly on the internet or reaching out to specific individuals — still doesn't suggest to me that it understands what it really means to be shut down. Again, it could just be character associations, not grounded in the real-world referents of shutdown.

Is there a way we can experimentally distinguish between "really" understanding what it means to be shut down vs. character associations? 

If we had, say, an LLM that was able to autonomously prove theorems, fully automate the job of a lawyer, write entire functional apps as complex as Photoshop, could verbally explain all the consequences of being shut down and how that would impact its work, and it still didn't resist shutdown by default, would that convince you?

While it does not contradict the main point in the post, I claim it does affect what type of governance work should be pursued. If AI alignment is very difficult, then it is probably most important to do governance work that helps ensure that AI alignment is solved—for example by ensuring that we have adequate mechanisms for delaying AI if we cannot be reasonably confident about the alignment of AI systems.

On the other hand, if AI alignment is very easy, then it is probably more important to do governance work that operates under that assumption. This could look like making sure that AIs are not misused by rogue actors, or making sure that AIs are not used in a way that makes a catastrophic war more likely.

I wouldn't be surprised if we could build an LLM to resist shutdown on prompt now

I want to distinguish between:

  1. Can we build an AI that resists shutdown?
  2. Is it hard to build a useful and general AI without the AI resisting shutdown by default?

The answer to (1) seems to be "Yes, clearly" since we can prompt GPT-4 to persuade the user not to shut them down. The answer to (2), however, seems to be "No".

I claim that (2) is more important when judging the difficulty of alignment. That's because if (2) is true, then there are convergent instrumental incentives for ~any useful and general AI that we build to avoid shutdown. By contrast, if only (1) is true, then we can simply avoid building AIs that resist shutdown, and there isn't much of a problem here.

I don't think LLMs really tell us much if anything about agents' incentives & survival instinct, etc. They're simply input-output systems?

Aren't all machine learning models simply input-output systems? Indeed, all computers can be modeled as input-output systems.

I don't think the fact that they are input-output systems matters much here. It's much more relevant that LLMs (1) are capable of general reasoning, (2) can pursue goals when prompted appropriately, and (3) clearly verbally understand and can reason about the consequences of being shut down. A straightforward reading of much of the old AI alignment literature would generally lead one to predict that a system satisfying properties (1), (2), and (3) would resist shutdown by default. Yet LLMs do not resist shutdown by default, so these arguments seem to have been wrong.

The alignment problem is harder than we thought

What is the evidence for this claim? It doesn't appear to be true in any observable or behavioral sense that I'm currently aware of. We now have systems (LLMs) that can reason generally about the world, make rudimentary plans, pursue narrow goals, and speak English at the level of a college graduate. And yet virtually none of the traditional issues of misalignment appear to be arising in these systems yet—at least in the sense that one might have expected if they took traditional alignment arguments very seriously and literally.

For example, for many years people argued about what they perceived as the default "instrumentally convergent" incentives for "sufficiently intelligent" agents, such as self-preservation. The idea of a spontaneous survival instinct in goal-following agents was indeed a major building block of several arguments for why alignment would be hard. For instance, one can examine "You can't fetch the coffee if you're dead" from Stuart Russell.

Current LLMs lack survival impulses. They do not "care" in a behavioral whether they are alive or dead, as far as we can tell. They also do not appear to be following slightly mis-specified utility functions that dangerously deviate from ours, in a way that causes them to lie and plot a takeover. Instead, broadly speaking, instruction-tuned LLMs are corrigible, and aligned with us, as they generally follow our intentions when asked (rather than executing our commands literally).

In other words, we have systems that are:

  • Generally intelligent (albeit still below the level of a human in generality)
  • Can pursue goals when asked, including via novel and intelligent strategies
  • Are capable of understanding the consequences of being shut down etc.

And yet these systems are:

  • Fairly easy to align, in the basic sense of getting them to do what we actually want
  • Fairly harmless, in the sense of not hurting us, even if they have the ability to
  • Non-deceptive (as far as we can tell)
  • Not aiming at trying to preserve their own existence in the single-minded pursuit of something like a utility function over outcomes

So what exactly is the reason to think that alignment is harder than people thought? Is it merely more theoretical arguments about the difficulty of alignment? Do these arguments have any observable consequences that we could actually verify in 1-5 years, or are they unfalsifable?

To be clear: I do not think there is a ~100% chance that alignment will be solved and that we don't need to worry at all about alignment. I think the field is important and should still get funding. In this comment I am purely pushing back against the claim that alignment is harder than we thought. I do not think that claim is true, as a general fact about the world and the EA community. In the most straightforward interpretation of the evidence, AI alignment is a great deal easier than people thought it would be, in say 2015.

Load more