MB

Matthew_Barnett

2253 karmaJoined Nov 2017

Comments
181

Note: I've edited this post to change my bottom-line TAI arrival distribution slightly. The edit doesn't reflect much of a change in my (underlying) transformative AI timelines, but rather (mostly) reflects a better compromise when visualizing things. 

To make a long story short, previously I put too little probability on TAI arriving between 2027-2035 because I wanted the plot to put very low probability on TAI arriving before 2027. Because of the way the Metaculus sliders work, this made it difficult to convey a very rapid increase in my probability after 2026. Now I've decided to compromise in a way that put what I think is an unrealistically high probability on TAI arriving before 2027.

That said, I have updated a little bit since I wrote this post:

  • I'm a little more skeptical of TAI happening at all in the 21st century, mostly as a result of reflecting on arguments in this paper from Ege Erdil and Tamay Besiroglu.
  • I'm a little more bullish on the possibility of a rapid scale-up of hardware in the mid-to-late 2020s, delivering a 10^28 FLOP training run before 2026, and/or a 10^30 FLOP training run before 2030. This update came after I read more about the existing capacity of semiconductor fabs.

I'll try not to change the post much going forward in the future, so that it can reflect a historical snapshot of how I thought about AI timelines in 2023, rather than a frequently updated document.

Most college-educated adults would get well under half of these problems right (the authors used computer science undergraduates as human subjects, and their performance ranged from 40% to 90%).

I think the hardness of the MATH benchmark was somewhat exaggerated. I downloaded the dataset myself and took a look, and came to the conclusion that many -- perhaps most -- of the questions are simple plug-and-chug problems. The reported performance of 40-90% among students may have been a result of time constraints rather than pure difficulty. In the paper, they wrote:

"To provide a rough but informative comparison to human-level performance, we randomly sampled 20 problems from the MATH test set and gave them to humans. We artificially require that the participants have 1 hour to work on the problems and must perform calculations by hand."

Two key points I want to add to this summary:

  1. I think these arguments push against broad public advocacy work, in favor of more cautious efforts to target regulation well, and make sure that it's thoughtful. Since I think we'll likely get strong regulation by default, ensuring that the regulation is effective and guided by high-quality evidence should be the most important objective at this point.
  2. Policymakers will adjust policy strictness in response to evidence about the difficulty of alignment. The important question is not whether the current level of regulation is sufficient to prevent future harm, but whether we have the tools to ensure that policies can adapt appropriately according to the best evidence about model capabilities and alignment difficulty at any given moment in time.

The Bay Area rationalist scene is a hive of techno-optimisitic libertarians.[1] These people have a negative view of state/government effectiveness at a philosophical and ideological level, so their default perspective is that the government doesn't know what it's doing and won't do anything

The attitude of expecting very few regulations made little sense to me, because -- as someone who broadly shares these background biases -- my prior is that governments will generally regulate a new scary technology that comes out by default. I just don't expect that regulations will always be thoughtful, or that they will weigh the risks and rewards of new technologies appropriately.

There's an old adage that describes how government sometimes operates in response to a crisis: "We must do something; this is something; therefore, we must do this." Eliezer Yudkowsky himself once said,

So there really is a reason to be allergic to people who go around saying, "Ah, but technology has risks as well as benefits".  There's a historical record showing over-conservativeness, the many silent deaths of regulation being outweighed by a few visible deaths of nonregulation.  If you're really playing the middle, why not say, "Ah, but technology has benefits as well as risks"?

Can you give some examples where individual humans have a clear strategic decisive advantage (i.e. very low risk of punishment), where the low-power individual isn't at a high risk of serious harm?

Why are we assuming a low risk of punishment? Risk of punishment depends largely on social norms and laws, and I'm saying that AIs will likely adhere to a set of social norms.

I think the central question is whether these social norms will include the norm "don't murder humans". I think such a norm will probably exist, unless almost all AIs are severely misaligned. I think severe misalignment is possible; one can certainly imagine it happening. But I don't find it likely, since people will care a lot about making AIs ethical, and I'm not yet aware of any strong reasons to think alignment will be super-hard.

Correct me if I'm wrong, but it seems like most of these reasons boil down to not expecting AI to be superhuman in any relevant sense

No, I certainly expect AIs will eventually be superhuman in virtually all relevant respects.

Resource allocation is relatively equal (and relatively free of violence) among humans because even humans that don't very much value the well-being of others don't have the power to actually expropriate everyone else's resources by force.

Can you clarify what you are saying here? If I understand you correctly, you're saying that humans have relatively little wealth inequality because there's relatively little inequality in power between humans. What does that imply about AI?

I think there will probably be big inequalities in power among AIs, but I am skeptical of the view that there will be only one (or even a few) AIs that dominate over everything else.

I do not think GPT-4 is meaningful evidence about the difficulty of value alignment.

I'm curious: does that mean you also think that alignment research performed on GPT-4 is essentially worthless? If not, why?

I think it's extremely unlikely that GPT-4 has preferences over world states in a way that most humans would consider meaningful, and in the very unlikely event that it does, those preferences almost certainly aren't centrally pointed at being honest, kind, and helpful.

I agree that GPT-4 probably doesn't have preferences in the same way humans do, but it sure appears to be a limited form of general intelligence, and I think future AGI systems will likely share many underlying features with GPT-4, including, to some extent, cognitive representations inside the system.

I think our best guess of future AI systems should be that they'll be similar to current systems, but scaled up dramatically, trained on more modalities, with some tweaks and post-training enhancements, at least if AGI arrives soon. Are you simply skeptical of short timelines? 

re: endogenous reponse to AI - I don't see how this is relevant once you have ASI. 

To be clear, I expect we'll get AI regulations before we get to ASI. I predict that regulations will increase in intensity as AI systems get more capable and start having a greater impact on the world.

Note that we are currently moving at pretty close to max speed, so this is a prediction that the future will be different from the past. 

Every industry in history initially experienced little to no regulation. However, after people became more acquainted with the industry, regulations on the industry increased. I expect AI will follow a similar trajectory. I think this is in line with historical evidence, rather than contradicting it.

re: perfectionism - I would not be surprised if many current humans, given superhuman intelligence and power, created a pretty terrible future. Current power differentials do not meaningfully let individual players flip every single other player the bird at the same time.

I agree. If you turned a random human into a god, or a random small group of humans into gods, then I would be pretty worried. However, in my scenario, there aren't going to be single AIs that suddenly become gods. Instead, in my scenario, there will be millions of different AIs, and the AIs will smoothly increase in power over time. During this time, we will be able to experiment and do alignment research to see what works and what doesn't at making the AIs safe. I expect AI takeof will be fairly diffuse, and AIs will probably be respectful of norms and laws because no single AI can take over the world by themselves. Of course, the way I think about the future could be wrong on a lot of specific details, but I don't see a strong reason to doubt the basic picture I'm presenting, as of now.

My guess is that your main objection here is that you think foom will happen, i.e. there will be a single AI that takes over the world and imposes its will on everyone else. Can you elaborate more on why you think that will happen? I don't think it's a straightforward consequence of AIs being smarter than humans.

I'm not totally sure what analogy you're trying to rebut, but I think that human treatment of animal species, as a piece of evidence for how we might be treated by future AI systems that are analogously more powerful than we are, is extremely negative, not positive.

My main argument is that we should reject the analogy itself. I'm not really arguing that the analogy provides evidence for optimism, except in a very weak sense. I'm just saying: AIs will be born into and shaped by our culture; that's quite different than what happened between animals and humans.

I might elaborate on this at some point, but I thought I'd write down some general reasons why I'm more optimistic than many EAs on the risk of human extinction from AI. I'm not defending these reasons here; I'm mostly just stating them.

  • Skepticism of foom: I think it's unlikely that a single AI will take over the whole world and impose its will on everyone else. I think it's more likely that millions of AIs will be competing for control over the world, in a similar way that millions of humans are currently competing for control over the world. Power or wealth might be very unequally distributed in the future, but I find it unlikely that it will be distributed so unequally that there will be only one relevant entity with power. In a non-foomy world, AIs will be constrained by norms and laws. Absent severe misalignment among almost all the AIs, I think these norms and laws will likely include a general prohibition on murdering humans, and there won't be a particularly strong motive for AIs to murder every human either.
  • Skepticism that value alignment is super-hard: I haven't seen any strong arguments that value alignment is very hard, in contrast to the straightforward empirical evidence that e.g. GPT-4 seems to be honest, kind, and helpful after relatively little effort. Most conceptual arguments I've seen for why we should expect value alignment to be super-hard rely on strong theoretical assumptions that I am highly skeptical of. I have yet to see significant empirical successes from these arguments. I feel like many of these conceptual arguments would, in theory, apply to humans, and yet human children are generally value aligned by the time they reach young adulthood (at least, value aligned enough to avoid killing all the old people). Unlike humans, AIs will be explicitly trained to be benevolent, and we will have essentially full control over their training process. This provides much reason for optimism.
  • Belief in a strong endogenous response to AI: I think most people will generally be quite fearful of AI and will demand that we are very cautious while deploying the systems widely. I don't see a strong reason to expect companies to remain unregulated and rush to cut corners on safety, absent something like a world war that presses people to develop AI as quickly as possible at all costs.
  • Not being a perfectionist: I don't think we need our AIs to be perfectly aligned with human values, or perfectly honest, similar to how we don't need humans to be perfectly aligned and honest. Individual humans are usually quite selfish, frequently lie to each other, and are often cruel, and yet the world mostly gets along despite this. This is true even when there are vast differences in power and wealth between humans. For example some groups in the world have almost no power relative to the United States, and residents in the US don't particularly care about them either, and yet they survive anyway.
  • Skepticism of the analogy to other species: it's generally agreed that humans dominate the world at the expense of other species. But that's not surprising, since humans evolved independently of other animal species. And we can't really communicate with other animal species, since they lack language. I don't think AI is analogous to this situation. AIs will mostly be born into our society, rather than being created outside of it. (Moreover, even in this very pessimistic analogy, humans still spend >0.01% of our GDP on preserving wild animal species, and the vast majority of animal species have not gone extinct despite our giant influence on the natural world.)

The second is describing an all-ready existing phenomenon of cost disease which while concerning has been compatible with high rates of growth and progress over the past 200 years.

I want to add further that cost disease is not only compatible with economic growth, cost disease itself is a result of economic growth, at least in the usual sense of the word. The Baumol effect -- which is what people usually mean when they say cost disease -- is simply a side effect of some industries becoming more productive more quickly than others. Essentially the only way to avoid cost disease is to have uniform growth across all industries, and that's basically never happened historically, except during times of total stagnation (in which growth is ~0% in every industry).

Even though I disagree with Caplan on x-risks, animal rights, mental illness, free will, and a few other things, I ultimately don't think it's necessarily suspicious for him to hold the most convenient view on a broad range of topics. One can imagine two different ways of forming an ideology:

  • The first way is to come up with an ideology a priori, and then interpret facts about the world in light of the ideology you've chosen. People who do this are prone to ideological biases since they're evaluating facts based partly on whether they're consistent with the ideology they've chosen, rather than purely based on whether the facts are true.
  • The second way is to interpret various facts about the world, and after some time, notice that there's a general theory that explains a bunch of independent facts. For example, you might notice that, in various domains, most people are biased towards voting for things that appear superficially socially desirable rather than what they know is actually good for people. Then, based on this general theory, an ideology falls right out.

I predict that, regardless of his own personal history, Bryan Caplan will probably appeal to the second type of reasoning in explaining why his views all seem "convenient". He might say: it's not that the facts are ideologically convenient, but that the ideology is convenient since it fits all the facts. (Although I also expect him to be a bit modest and admit that he might be wrong about the facts.)

Bryan Caplan co-authored a paper critiquing Georgism in 2012. From the blog post explaining the critique,

My co-author Zachary Gochenour and I have a new working paper arguing that the Single Tax suffers from a much more fundamental flaw.  Namely: A tax on the unimproved value of land distorts the incentive to search for new land and better uses of existing land.  If we actually imposed a 100% tax on the unimproved value of land, any incentive to search would disappear.  This is no trivial problem: Imagine the long-run effect on the world’s oil supply if companies stopped looking for new sources of oil.

I can explain our argument with a simple example.  Clever Georgists propose a regime where property owners self-assess the value of their property, subject to the constraint that owners must sell their property to anyone who offers that self-assessed value.  Now suppose you own a vacant lot with oil underneath; the present value of the oil minus the cost of extraction equals $1M.  How will you self-assess?  As long as the value of your land is public information, you cannot safely self-assess at anything less than its full value of $1M.  So you self-assess at $1M, pay the Georgist tax (say 99%), and pump the oil anyway, right?

There’s just one problem: While the Georgist tax has no effect on the incentive to pump discovered oil, it has a devastating effect on the incentive to discover oil in the first place.  Suppose you could find a $1M well by spending $900k on exploration.  With a 99% Georgist tax, your expected profits are negative $890k. (.01*$1M-$900k=-$890k) 

You might think that this is merely a problem for a handful of industries.  But that’s probably false.  All firms engage in search, whether or not they explicitly account for it.  Take a real estate developer.  One of his main functions is to find valuable new ways to use existing land.  “This would be a great place for a new housing development.”  “This would be a perfect location for a Chinese restaurant.”  And so on.

Load more