Hide table of contents

Comment Permalink

Yeah, I’m super worried about this development. I think s-risks (at least those that are at all AI-related) are mostly a result from conflicts between AIs. So the slower the takeoff, the greater the risk of us ending up in a multipolar takeoff, the greater the risk of AI conflict, and the greater the risk of extreme suffering. Admittedly, only because the takeoff looks slower now, doesn’t automatically mean that any self-improvement phase where AI goes from human-range intelligence to obvious superintelligence will be any less quick, but I’d still take it as a weak indication that that might be the case.

There’s also the consideration that alignment still seems really hard to me. I’ve watched the space for 8 years, and alignment hasn’t come to look any less hard to me over that time span. If something is about to be solved or probably easier than expected, 8 years should be enough to see a bit of a development from confusion to clarity, from overwhelm to research agenda, etc. Quite to the contrary, there is much more resignation going around now, so alignment is probably a bigger problem than the average PhD thesis. (Then again the resignation may be the result of shortening timelines rather than anything about the problem.)

I’m not quite sure in which direction that consideration pushes, but I think it makes s-risks a bit less likely again? Ideally we’d just solve alignment, but failing that, it could mean (1) we’re unrepresentatively dumb and the problem is easy or (2) the problem is hard. Option 1 would suck because now alignment is in the hands of all the random AIs that won’t want to align its successors with human values. S-risk now depends on how many AIs there are and how slow the last stretch to superintelligence will be. Option 2 is a bit better because AIs also won’t solve it, so that the future is in the hands of AIs who don’t mind value drift or failed to anticipate it. That sounds bad at first, but it’s a bit of a lesser evil, I think, because there’s no reason why that would happen bunched up for several AIs at one point in time. So the takeoff would have to be very slow for it to still be multipolar.

So yeah, a bunch more worry here, but all the factors are not all pushing in the same direction at least.
I feel quite agnostic about that… Whatever the concrete probabilities p_2020 and p_2023, I think p_2020 < p_2023 for the question of whether GPT-5 will still be in the broadly human band of IQ 120–180 or so.
Yes, that makes a lot of sense! Maybe people who don’t trust others so easily (without just being biased in the other direction in general, so those who actually make better prediction than me over a similar sample), have stronger theory of mind, so that the differential between how well they know themselves and how well they know others is lesser. Or perhaps something weird is going on where they are more on autopilot themselves but have developed great surface-level heuristics that they apply to understand other and themselves. That seems less likely though, because those surface-level heuristics would have to be insanely good to make up for a lack of mechanistic insight.
I should mention that tasks that seem way too hard to me have routinely taken me a few days to complete in the past. So I’m also cautious to overupdate on my feeling that alignment is hard for that reason. That said that answer is yes, but a public forum is not a comfy place for that sort of reflection. xD

See in context

Exercise: Things we got wrong

by Adam Binksmith, ChanaMessinger

Apr 7 20231 min read 12

37

CommunityPersonal developmentPostmortems & retrospectivesRationality

Frontpage

Exercise: Things we got wrong

How to play:

Ground rules:

12 comments

Let’s reflect on where we’ve updated our views, and how we could improve our epistemics to avoid making the same mistakes again!

How to play:

Search: Look through your past comments and posts, to find a claim you made that you’ve since updated about and no longer endorse. (Or, think of a view you’ve updated about that you didn’t post here)
Share: Add a comment to this post, quoting/describing the original claim you made. Explain why you don’t believe it any more or why you think you made the mistake you made
1. What mental motion were you missing?
2. Was there something you were trying to protect?
Reflect: Reply to your own comment or anyone else’s, reflecting about how we can avoid making these mistakes in the future, and giving tips about what's worked for you

Ground rules:

Be supportive: It is commendable to publicly share when you’ve changed your views!
Be constructive: Focus on specific actions we can take to avoid the mistake next time
Stay on-topic: This is not the place to focus on discussing whether the original claim was right or wrong - for that, you can reply to the comment in its original context. (Though stating your view in the context of giving thought’s on people’s updates might make sense)

Let’s have fun with this and reflect on how we can improve our epistemics!

Inspired by jacobjacob’s “100 ways to light a candle” babble challenge.

37 Reactions

Comments12

Sorted by

New & upvoted

Click to highlight new comments since: Today at 7:12 AM

LinchApr 8 20239

lol when I first read this title, I thought it would be a survey article about common myths and null results in the exercise literature.

Dawn DrescherApr 10 20232

I hope next April 1 they’ll introduce lol karma! xD

Nathan YoungApr 7 20238

I have overrated how hard AI risk is to understand. I hadn't made a forecast but I think this will cause lots of my forecasts to be off.

ChanaMessingerApr 7 20232

Do you have a sense of why you made that particular mistake?

Nathan YoungApr 7 20234

I guess I've tried to convince friends and often that was hard. My tech friends, Christian friends and family are often not concerned. I guess I don't know the kinds of people who are very scared at first hearing.

ChanaMessingerApr 7 20237

I think a couple times, e.g. here and here, I got pretty swept up in how much I liked some of what a post was doing (with Evan Hubinger's fraud post I liked for instance that he wasn't trying to abandon or jettison consequentialism while still caring a lot about what went wrong with FTX, with Gavin's post about moral compromise I liked that he was pointing at purity and perfection being unrealistic goals), that I wrote strongly positive comments that I later felt, after further comments and/or conversations, didn't track all the threads I would have wanted to. (For the post-FTX one I also think I got swept up in worries that we would downplay the event as a community or not respond with integrity and so was emotionally relieved at someone taking a strong stance).

I don't think this is always bad, it's fine to comment on one aspect of a post, but if I'm doing that, I want to notice it and do it on purpose, especially when I'm writing a comment that gives a vibe of strong overall support.

Dawn DrescherApr 7 20236

Until 2020 or so I thought that specialized AI services would be cheaper to build, more powerful, more useful, etc. than AGI.
1. Sadly, this still seems plausible to me based on what I knew at the time. Any idea how I could’ve predicted GPT-4?
I think I believed until recently that we would never have human-level AI but that it would go from dumber than any human straight to superintelligent. I think it was just one of those things I heard and read about without questioning them.
1. I could’ve just thought more… Or noticed that there was something I hadn’t spent time questioning yet. Especially when I updated downward a bit on the feasibility of recursive self-improvement, I could’ve noticed what other beliefs also needed updating as a result.
I seem to still have a general tendency to trust others much more easily than myself. Keep my tokens in my own wallet? Oh no! What if I lose the passphrase? Better keep them on one of those exchanges that have a half-life of a few years. Drive a car? No, I’d be an anxious wreck! Ride with a friend who almost crashed a plane once and tried to commit suicide by crashing their car? Sure, doesn’t even occur to me to be worried.
1. I can’t trust my intuitive levels of anxiety here at all. I’ll need to remember to consciously reflect on whether my anxiety level is appropriate – also when I feel none at all.
Corollary: I didn’t see the whole FTX thing coming – my trust in the competence and integrity of people I hardly knew completely overruled the unfavorable base rate of the half-life of crypto exchanges.
1. See above.
I’m not sure, but I feel like I overupdated on the importance of alignment for moral cooperation with aliens. Alignment used to and now again seems like the wrong priority to me on tractability grounds. Cooperative AI looks like a much stronger (more tractable) path forward.
1. Lots of friends of mine are excited about alignment, so feeling like it’s (relatively) hopeless was socially inconvenient. But I suppose hopelessness is in vogue now, so it becomes easier for me to acknowledge it.

ChanaMessingerApr 10 20236

This is all really thoughtful and an interesting insight into others' minds - thanks so much for sharing.

I don't have answers, but some thoughts might be:

Are there other thoughts that depend on going to superintelligence quickly that you want to re-examine now?
Are there predictions you'd now make about GPT-5?
Do you know why you trust others more? One hypothesis I have based on what you wrote is that you have a more granular sense of what goes wrong when you mess things up, and a less detailed sense of how other people do things. No idea if that's right, but thought I'd throw it out there?
Are there other things that if they were socially rewarded you would believe more strongly?

Dawn DrescherApr 10 20233

Yeah, I’m super worried about this development. I think s-risks (at least those that are at all AI-related) are mostly a result from conflicts between AIs. So the slower the takeoff, the greater the risk of us ending up in a multipolar takeoff, the greater the risk of AI conflict, and the greater the risk of extreme suffering. Admittedly, only because the takeoff looks slower now, doesn’t automatically mean that any self-improvement phase where AI goes from human-range intelligence to obvious superintelligence will be any less quick, but I’d still take it as a weak indication that that might be the case.

There’s also the consideration that alignment still seems really hard to me. I’ve watched the space for 8 years, and alignment hasn’t come to look any less hard to me over that time span. If something is about to be solved or probably easier than expected, 8 years should be enough to see a bit of a development from confusion to clarity, from overwhelm to research agenda, etc. Quite to the contrary, there is much more resignation going around now, so alignment is probably a bigger problem than the average PhD thesis. (Then again the resignation may be the result of shortening timelines rather than anything about the problem.)

I’m not quite sure in which direction that consideration pushes, but I think it makes s-risks a bit less likely again? Ideally we’d just solve alignment, but failing that, it could mean (1) we’re unrepresentatively dumb and the problem is easy or (2) the problem is hard. Option 1 would suck because now alignment is in the hands of all the random AIs that won’t want to align its successors with human values. S-risk now depends on how many AIs there are and how slow the last stretch to superintelligence will be. Option 2 is a bit better because AIs also won’t solve it, so that the future is in the hands of AIs who don’t mind value drift or failed to anticipate it. That sounds bad at first, but it’s a bit of a lesser evil, I think, because there’s no reason why that would happen bunched up for several AIs at one point in time. So the takeoff would have to be very slow for it to still be multipolar.

So yeah, a bunch more worry here, but all the factors are not all pushing in the same direction at least.
I feel quite agnostic about that… Whatever the concrete probabilities p_2020 and p_2023, I think p_2020 < p_2023 for the question of whether GPT-5 will still be in the broadly human band of IQ 120–180 or so.
Yes, that makes a lot of sense! Maybe people who don’t trust others so easily (without just being biased in the other direction in general, so those who actually make better prediction than me over a similar sample), have stronger theory of mind, so that the differential between how well they know themselves and how well they know others is lesser. Or perhaps something weird is going on where they are more on autopilot themselves but have developed great surface-level heuristics that they apply to understand other and themselves. That seems less likely though, because those surface-level heuristics would have to be insanely good to make up for a lack of mechanistic insight.
I should mention that tasks that seem way too hard to me have routinely taken me a few days to complete in the past. So I’m also cautious to overupdate on my feeling that alignment is hard for that reason. That said that answer is yes, but a public forum is not a comfy place for that sort of reflection. xD

ChanaMessingerApr 11 20234

Thanks so much for all your vulnerability and openness here - I think all the kinds of emotionally complex things you're talking about have real and important effects on individual and group epistemics and I'm really glad we can do some talking about them.

Dawn DrescherApr 11 20232

Awww! Thank you for organizing this!

Milli🔸May 18 20232

Belief: I thought Russia would not invade Ukraine until it actually happened.

Reasoning: Russia is intertwined too closely with the EU and especially Germany. The CIA is lying/exaggerating to disrupt the cooperation.

What was I (possibly) trying to protect: I might have held economic partnership and entwinement in too high regard. I also might have thought that war in Europe was a thing of the past.

I'm trying to keep track of when I change my mind, but it's hard to notice when it happens and what exactly I thought before!