TD

Tom_Davidson

1334 karmaJoined

Comments
68

Thanks for this.

In each the examples you give, i'm thinking that the pause would be significantly more beneficial (plausibly by 10x) if we pause when AI is already capable enough that it can significantly help us solve the issue. In general, they seem like the kinds of issues where AI could massively accelerate progress.

So if i'm choosing between international pause now vs international pause in 2 years, I choose the latter. (I assume we're talking about international pauses here rather than just the U.S. but lmk if you also support a unilateral pause now!)

I do find Holly's point that it might be damaging to quibble about exactly when we pause if that reduces the chance of a pause happening at all. And today we are very far from a pause actually happening, and one may well be needed in two years' time, so I def support efforts to get us closer to a pause!

I'm hesitant about saying "pause now" because I actually think a different policy might be much more effective. But I think a world where we were about to do an international pause would be better than the actual world.

(I want to think more about this topic and all of this is v tentative.)

Thanks! This is a great comment.

I should I was using is from the perspective society. We as a society should set a norm and expectation for systems to have these drives.

I completely agree that there's a tricky question for how you make that incentive compatible when there are multiple companies. And I agree that it could be a bad idea for company to do this unilaterally.

This is a real trade-off when thinking about what AI character should look like. I think that more work to think about that trade-off and where we should be along it, and ways to get the best of both -- the biggest benefits of being ethical and of being obedient -- is hugely valuable. 

This comment discusses a similar objection.

 

In brief, I think that the transitional period before ASI will shape the world that ASI is built into. That is, it will indirectly shape the values of ASI and the institutions that it's embedded in and who controls it. I think the impact flows through to ASI. An analogy is how the behaviour of humans over the last two decades has flowed through to the character and impacts of today's AI systems. 

 

You might be right that the same alignment methods don't work for superintelligence, but I don't think that undermines the value of this work. If we can agree on what a good character would be for superintelligence, then you can use the new alignment methods to aim for that same character. Of course, if alignment is completely hopeless, then we shouldn't work on AI character, but I think there is a good chance that alignment is solvable. 

Who are the people that you're worried will not use their resources for good purposes by default?

If you're worried about ppl with dmr in consensium etc, this could still shift them (see our sensitivity analysis)

Is the threat model for stages one and two the same as the one in my post on whether one country can outgrow the rest of the world?

 

That post just looks at the basic economic forces driving super-exponential growth and argues that they would lead an actor that starts with more than half of resources, and doesn't trade, to gain a larger and larger fraction of world output. 

 

I can't tell from this post whether there are dynamics that are specific to settling the solar system in Stage II that feed into the first-mover advantage or whether it's just simple super-exponential growth. 

 

My current guess is that there are so many orders of magnitude for growth on Earth that super-exp growth would lead to a decisive strategic advantage without even going to space. If that's right (which it might not be), then it's unclear that stage two adds that much. 

Perhaps many minds end up at a shared notion of what they’re aiming for, via acausal trade (getting to some grand bargain), or evidential cooperation in large worlds

This isn't convergence. It's where ppl DIDN"T converge but make a win-win compromise deal. If ppl all converge, there's no role for acausal trade/ECL 

It seems like we can predictably make moral progress by reflecting; i.e. coming to answers + arguments that would be persuasive to our former selves
I think I’m more likely to update towards the positions of smart people who’ve thought long and hard about a topic than the converse (and this is more true the smarter they are, the more they’re already aware of all the considerations I know, and the more they’ve thought about it)
If I imagine handing people the keys to the universe, I mostly want to know “will they put serious effort into working out what the right thing is and then doing it” rather than their current moral views

I guess i feel this shows that there's some shared object-level moral assumptions and some shared methodology for making progress among humans. 

 

But doesn't show that the overlap is strong enough for convergence. 

 

And you could explain the overlap by conformism, in which case i wouldn't agree with societies i haven't interacted with

I think my biggest uncertainty about this is:

 

If there were a catastrophic setback of this kind, and civilisation tried hard to save and maintain the weights of superintelligent AI (which they presumably would), how likely are they to succeed? 

 

My hunch is that they very likely could succeed. E.g. in the first cpl of decades they'd have continued access to superintelligent AI advice (and maybe robotics) from pre-existing hardware. They could use that to bootstrap to longer periods of time. E.g. saving the weights on hard drives rather than SSDs, and then later transferring them to a more secure, long lasting format. Then figure out the minimal-effort-version of compute maintenance and/or production needed to keep running some superintelligences indefinitely

Really like this post!

 

I'm wondering whether human-level AI and robotics will significantly decrease civilisation's susceptibility to catastrophic setbacks?

AI systems and robots can't be destroyed by pandemics. They don't depend on agriculture -- just mining and some form of energy production. And a very small number of systems could hold tacit expertise for ~all domains. 

Seems like this this might reduce the risk by a lot, such that the 10% numbers you're quoting are too high. E.g. you're assigning 10% to a bio-driven set-back. But i'd have thought that would have to happen before we get human-level robotics?

Load more