On point 1 (space colonization), I think it's hard and slow! So the same issue as with bio risks might apply: AGI doesn't get you this robustness quickly for free. See other comment on this post.
I've been meaning to write something about 'revisiting the alignment strategy'. The section 5 here ('Won't AGI make post-AGI catastrophes essentially irrelevant?') makes the point very clearly:
On this view, a post-AGI world is nearly binary—utopia or extinction—leaving little room for Sisyphean scenarios.
But I think this is too optimistic about the speed and completeness of the transition to globally deployed, robustly aligned "guardian" systems.
without making much of a case for it. Interested in Will and reviewers' sense of the space and literature here.
Yep, definitely for me 'big civ setbacks are really bad' was already baked in from the POV of setting bad context for pre-AGI-transition(s) (as well as their direct badness). But while I'd already agreed with Will about post-AGI not being an 'end of history' (in the sense that much remains uncertain re safety), I hadn't thought through the implication that setbacks could force a rerun of the most perilous transition(s), which does add some extra concern.
A small aside: some put forth interplanetary civilisation as a partial defence against either of total destruction and 'setback'. But reaching the milestone of having a really robustly interplanetary civ might itself take quite a long time after AGI - especially if (like me) you think digital uploading is nontrivial.
(This abstractly echoes the suggestion in this piece that bio defence might take a long time, which I agree with.)
Some gestures which didn't make the cut as they're too woolly or not quite the right shape:
This is lovely, thank you!
My main concern would be that it takes the same very approximating stance as much other writing in the area, conflating all kinds of algorithmic progress into a single scalar 'quality of the algorithms'.
You do moderately well here, noting that the most direct interpretation of your model regards speed or runtime compute efficiency, yielding 'copies that can be run' as the immediate downstream consequence (and discussing in a footnote the relationship to 'intelligence'[1] and the distinction between 'inference' and training compute...
On this note, the Future of Life Foundation (headed by Anthony Aguirre, mentioned in this post) is today launching a fellowship on AI for Human Reasoning.
Why? Whether you expect gradual or sudden AI takeoff, and whether you're afraid of gradual or acute catastrophes, it really matters how well-informed, clear-headed, and free from coordination failures we are navigating into and through AI transitions. Just the occasion for human reasoning uplift!
12 weeks, $25-50k stipend, mentorship, and potential pathways to future funding and impact. Applications close June 9th.
(cross-posted on LW)
Love this!
As presaged in our verbal discussion my top conceptual complement would be to emphasise exploration/experimentation as central to the knowledge production loop - the cycle of 'developing good taste to plan better experiments to improve taste (and planning model)' is critical (indispensable?) to 'produce new knowledge which is very helpful by the standards of human civilization' (on any kind of meaningful timescale).
This because just flailing, or even just 'doing stuff', gets you some novelty of observations, but directedly see...
I like this decomposition!
I think 'Situational Awareness' can quite sensibly be further divided up into 'Observation' and 'Understanding'.
The classic control loop of 'observe', 'understand', 'decide', 'act'[1], is consistent with this discussion, where 'observe'+'understand' here are combined as 'situational awareness', and you're pulling out 'goals' and 'planning capacity' as separable aspects of 'decide'.
Are there some difficulties with factoring?
Certain kinds of situational awareness are more or less fit for certain goals. And further, the important 're...
A little followup:
I took part in the inaugural SERI MATS programme in 2021-2022 (where incidentally I interacted with Richard), and started an AI Safety PhD at Oxford in 22.
I'm now working for the AI Safety Institute (UK Gov) since Jan 2024 as a hybrid technical expert, utilising my engineering and DS background, alongside AI/ML research and threat modelling. Likely to continue such work, there or elsewhere. Unsure if I'll finish my PhD in the end, as a result, but I don't regret it: I produced a little research, met some great collaborators, and had fun w...
FWIW I work at the AI Safety Institute UK and we're considering a range of both misuse and misalignment threats, and there are a lot of smart folks on board taking things pretty seriously. I admit I... don't fully understand how we ended up in this situation and it feels contingent and precious, as does the tentative international consensus on the value of cooperation on safety (e.g. the Bletchley declaration). Some people in government are quite good, actually!
Sure, take it or leave it! I think for the field-building benefits it can look more obviously like an externality (though I-the-fundraiser would in fact be pleased and not indifferent, presumably!), but the epistemic benefits could easily accrue mainly to me-the-fundraiser (of course they could also benefit other parties).
How much of this is lost by compressing to something like: virtue ethics is an effective consequentialist heuristic?
I've been bought into that idea for a long time. As Shaq says, 'Excellence is not a singular act, but a habit. You are what you repeatedly do.'
We can also make analogies to martial arts, music, sports, and other practice/drills, and to aspects of reinforcement learning (artificial and natural).
Simple, clear, thought-provoking model. Thanks!
I also faintly recall hearing something similar in this vicinity: apparently some volunteering groups get zero (or less!?) value from many/most volunteers, but engaged volunteers dominate donations, so it's worthwhile bringing in volunteers and training them! (citation very much needed)
Nitpick: are these 'externalities'? I'd have said, 'side effects'. An externality is a third-party impact from some interaction between two parties. The effects you're describing don't seem to be distinguished by being third-party per se (I can imagine glossing them as such but it's not central or necessary to the model).
I think this is a good and useful post in many ways, in particular laying out a partial taxonomy of differing pause proposals and gesturing at their grounding and assumptions. What follows is a mildly heated response I had a few days ago, whose heatedness I don't necessarily endorse but whose content seems important to me.
Sadly this letter is full of thoughtless remarks about China and the US/West. Scott, you should know better. Words have power. I recently wrote an admonishment to CAIS for something similar.
...The biggest disadvantage of pausing for a long
I think that the best work on AI alignment happens at the AGI labs
Based on your other discussion e.g. about public pressure on labs, it seems like this might be a (minor?) loadbearing belief?
I appreciate that you qualify this further in a footnote
This is a controversial view, but I’d guess it’s a majority opinion amongst AI alignment researchers.
I just wanted to call out that I weakly hold the opposite position, and also opposite best guess on majority opinion (based on safety researchers I know). Naturally there are sampling effects!
This is a margi...
This is an exemplary and welcome response: concise, full-throated, actioned. Respect, thank you Aidan.
Sincerely, I hope my feedback was all-considered good from your perspective. As I noted in this post, I felt my initial email was slightly unkind at one point, but I am overall glad I shared it - you appreciate my getting exercised about this, even over a few paragraphs!
...It’s important to discuss national AI policies which are often explicitly motivated by goals of competition without legitimizing or justifying zero-sum competitive mindsets which can unde
(Prefaced with the understanding that your comment is to some extent devil's advocating and this response may be too)
both the US and Chinese governments have the potential to step in when corporations in their country get too powerful
What is 'step in'? I think when people are describing things in aggregated national terms without nuance, they're implicitly imagining govts either already directing, or soon/inevitably appropriating and directing (perhaps to aggressive national interest plays). But govts could just as readily regulate and provide guidance...
Thanks Ben!
Please don't take these as endorsements that this thinking is correct, just that it's what I see when I inspect my instincts about this
Appreciated.
These psychological (and real) factors seem very plausible to me for explaining why mistakes in thinking and communication are made.
maybe we can think of the US companies as simultaneously closer friends and closer enemies with each other?
Mhm, this seems less lossy as a hypothetical model. Even if they were only 'closer friends', though, I don't think it's at all clearcut enough for it to be a...
Just in case we're out of sync, let's briefly refocus on some object details
China has made several efforts to preserve their chip access, including smuggling, buying chips that are just under the legal limit of performance, and investing in their domestic chip industry.
Are you aware of the following?
If...
Interesting. I'd love to know if you think the crux schema I outlined is indeed important? I mean this:
How quickly/totally/coherently could US gov/CCP capture AI talent/artefacts/compute within its jurisdiction and redirect them toward excludable destructive ends? Under what circumstances would they want/be able to do that?
Correct me at any point if I misinterpret: I read that, on the basis of answers to something a bit like these, you think an international competition/race is all but inevitable? Presumably that registers as terrifically dangerous for...
Thanks for this thoughtful response!
this tendency leads to analysis that assumes more coordination among governments, companies, and individuals in other countries than is warranted. When people talk about "the US" taking some action... more likely to be aware of the nuance this ignores... less likely to consider such nuances when people talk about "China" doing something
This seems exactly right and is what I'm frustrated by. Though, further than you give credit (or un-credit) for, frequently I come across writing or talking about "US success in AI", "...
The stream cut out, but there are longer versions available e.g. https://www.youtube.com/watch?v=Dg-rKXi9XYg
Great read, and interesting analysis. I like encountering models for complex systems (like community dynamics)!
One factor I don't think was discussed (maybe the gesture at possible inadequacy of encompasses this) is the duration of scandal effects. E.g. imagine some group claiming to be the Spanish Inquisition or the Mongol Horde, or the Illuminati tried to get stuff done. I think (assuming taken seriously) they'd encounter lingering reputational damage more than one year after the original scandals! Not sure how this models out; I'm not planning to d...
OpenAI as a whole, and individuals affiliated with or speaking for the org, appear to be largely behaving as if they are caught in an overdetermined race toward AGI.
What proportion of people at OpenAI believe this, and to what extent? What kind of observations, or actions or statements by others (and who?) would change their minds?
Great post. I basically agree, but in a spirit of devil's advocating, I will say: when I turn my mind to agent foundations thinking, I often find myself skirting queasily close to concepts which feel also capabilities-relevant (to the extent that I have avoided publicly airing several ideas for over a year).
I don't know if that's just me, but it does seem that some agent foundations content from the past has also had bearing on AI capabilities - especially if we include decision theory stuff, dynamic programming and RL, search, planning etc. which it's arg...
Thank you for sharing this! Especially the points about relevant maps and Meta/FAIR/LeCun.
I was recently approached by the UK FCDO as a technical expert in AI with perspective on x-risk. We had what I think were very productive conversations, with an interesting convergence of my framings and the ones you've shared here - that's encouraging! If I find time I'm hoping to write up some of my insights soon.
This is beautiful and important Tyler, thank you for sharing.
I've seen a few people burn out (and come close myself), and I have made a point of gently socially making and reinforcing this sort of point (far less eloquently) myself, in various contexts.
I have a lot of thoughts about this subject.
One thing I embrace always is silliness and (often self-deprecating) humour, which are useful antidotes to stress for a lot of people. Incidentally, your tweet thread rendition of the Eqyptian spell includes
...I am light heading for light. Even in the dark, a fi
Seconded/thirded on Human Compatible being near that frontier. I did find its ending 'overly optimistic' in the sense of framing it like 'but lo, there is a solution!' while other similar resources like Superintelligence and especially The Alignment Problem seem more nuanced in presenting uncertain proposals for paths forward not as oven-ready but preliminary and speculative.
I'm intrigued by this thread. I don't have an informed opinion on the particular aesthetic or choice of quiz questions, but I note some superficial similarities to Coursera, Khan Academy, and TED-Ed, which are aimed at mainly professional age adults, students of all ages, and youth/students (without excluding adults) respectively.
Fun/cute/cartoon aesthetics do seem to abound these days in all sorts of places, not just for kids.
My uninformed opinion is that I don't see why it should put off teenagers (talented or otherwise) in particular, but I weakly agree that if something is explicitly pitched at teenagers, that might be offputting!
I've considered a possible pithy framing of the Life Despite Suffering question as a grim orthogonality thesis (though I'm not sure how useful it is):
We sometimes point to the substantial majority's revealed preference for staying alive as evidence of a 'life worth living'. But perhaps 'staying-aliveness' and 'moral patient value' can vary more independently than that claim assumes. This is the grim orthogonality thesis.
An existence proof for the 'high staying-aliveness x low moral patient value' quadrant is the complex of torturer+torturee, which quite cl...
I'm shocked and somewhat concerned that your empirical finding is that so few people have encountered or thought about this crucial consideration.
My experience is different, with maybe 70% of AI x-risk researchers I've discussed with being somewhat au fait with the notion that we might not know the sign of future value conditional on survival. But I agree that it seems people (myself included) have a tendency to slide off this consideration or hope to defer its resolution to future generations, and my sample size is quite small (a few dozen maybe) and quit...
My anecdata is also that most people have thought about it somewhat, and "maybe it's okay if everyone dies" is one of the more common initial responses I've heard to existential risk.
But I agree with OP that I more regularly hear "people are worried about negative outcomes just because they themselves are depressed" than "people assume positive outcomes just because they themselves are manic" (or some other cognitive bias).
Got it, I think you're quite right on one reading. I should have been clearer about what I meant, which is something like
E.g. imagine a minor steelification (which loses the aesthetic and rhetorical strength) like "nobody's positive wellbeing (implicitly stemming from their freedom) can/should be cel...
It's possible the selection bias is high, but I don't have good evidence for this besides personal anecdata. I don't know how many people are relevantly similar to me, and I don't know how representative we are of the latest EA 'freshers', since dynamics will change and I'm reporting with several years' lag.
Here's my personal anecdata.
Since 2016, around when I completed undergrad, I've been an engaged (not sure what counts as 'highly engaged') longtermist. (Before that point I had not heard of EA per se but my motives were somewhat proto EA and I wanted to...
I just wanted to state agreement that it seems a large number of people largely misread Death with Dignity, at least according to what seems to me the most plausible intended message: mainly about the ethical injunctions (which are very important as a finitely-rational and prone-to-rationalisation being), as Yudkowsky has written of in the past.
The additional detail of 'and by the way this is a bad situation and we are doing badly' is basically modal Yudkowsky schtick and I'm somewhat surprised it updated anyone's beliefs (about Yudkowsky's beliefs, and th...
I wrote something similar (with more detail) about the Gato paper at the time.
I don't think this is any evidence at all against AI risk though? It is maybe weak evidence against 'scaling is all you need' or that sort of thing.
Thanks Rohin, I second almost all of this.
Interested to hear more about why long-term credit assignment isn't needed for powerful AI. I think it depends how you quantify those things and I'm pretty unsure about this myself.
Is it because there is already loads of human-generated data which implicitly embody or contain enough long-term credit assignment? Or is it that long-term credit assignment is irrelevant for long-term reasoning? Or maybe long-term reasoning isn't needed for 'powerful AI'?
A nit
appeals to me, I'm sure to some others, but (I sense) could come across with a particular political-tribal flavour, which you might want to try neutralising. (Or not! if that'd detract from the net appeal)