Yeah I didn't mean to imply that it's a good idea to keep them out permanently, but the fact that they're not in right now is a good sign that this is for real. If they'd just joined and not changed anything about their current approach I'd suspect the whole thing was for show
This seems overall very good at first glance, and then seems much better once I realized that Meta is not on the list. There's nothing here that I'd call substantial capabilities acceleration (i.e. attempts to collaborate on building larger and larger foundation models, though some of this could be construed as making foundation models more useful for specific tasks). Sharing safety-capabilities research like better oversight or CAI techniques is plausibly strongly net positive even if the techniques don't scale indefinitely. By the same logic, while this by itself is nowhere near sufficient to get us AI existential safety if alignment is very hard (and could increase complacency), it's still a big step in the right direction.
adversarial robustness, mechanistic interpretability, scalable oversight, independent research access, emergent behaviors and anomaly detection. There will be a strong focus initially on developing and sharing a public library of technical evaluations and benchmarks for frontier AI models.
The mention of combating cyber threats is also a step towards explicit pTAI.
BUT, crucially, because Meta is frozen out we can know both that this partnership isn't toothless, represents a commitment to not do the most risky and antisocial things Meta presumably doesn't want to give up, and the fact that they're the only major AI company in the US to not join will be horrible PR for them as well.
I think you have to update against the UFO reports being veridical descriptions of real objects with those characteristics because of just how ludicrous the implied properties are. This paper says 5370 g as a reasonable upper bound on acceleration, implying with some assumptions about mass an effective thrust power on the order of 500 GW in something the size of a light aircraft, with no disturbance in the air either from the very high hypersonic wake and compressive heating or the enormous nuclear explosion sized bubble of plasmafied air that the exhaust and waste heat emissions something like this would produce.
At a minimum, to stay within the bounds of mechanics and thermodynamics, you'd need to be able to ignore airflow and air resistance entirely, have the ability to emit reaction mass in a completely non-interacting form, and the ability to emit waste energy in a completely non-interacting form as well.
To me, the dynamical characteristics being this crazy points far more towards some kind of observation error, so I don't think we should treat them as any kind of real object with those properties until we can conclusively rule out basically all other error sources.
So even if the next best explanation is 100x worse at explaining the observations, I'd still believe it over a 5000g airflow-avoiding craft that expels invisible reaction mass and invisible waste heat while maneuvering. Maybe not 10,000x worse since it doesn't outright contradict the laws of physics, but still the prior on this even being technically possible with any amount of progress is low, and my impression (just from watching debates back and forth on potential error sources) is that we can't rule out every mundane explanation with that level of confidence.
Very nice! I'd say this seems like it's aimed at a difficulty level of 5 to 7 on my table,
I.e. experimentation on dangerous systems and interpretability play some role but the main thrust is automating alignment research and oversight, so maybe I'd unscientifically call it a 6.5, which is a tremendous step up from the current state of things (2.5) and would solve alignment in many possible worlds.
There are other things that differentiate the camps beyond technical views, how much you buy 'civilizational inadequacy' vs viewing that as a consequence of sleepwalk bias, but one way to cash this out is if you're in the green/yellow&red/black zones on the scale of alignment difficulty, Dismissers are in the green (although they shouldn't be imo even given that view), Worriers are in the yellow/red and Doomers in black (and maybe the high end of red).
What does Ezra think of the 'startup government mindset' when it comes to responding to fast moving situations, e.g. The UK explicitly modelling its own response off the COVID Vaccine taskforce, doing end runs around traditional bureaucratic institutions, recruiting quickly through Google docs etc. See e.g. https://www.lesswrong.com/posts/2azxasXxuhXvGfdW2/ai-17-the-litany
Is it just hype and translating a startup mindset to government when it doesn't apply or actually useful here?
Check whether the model works with Paul Christiano-type assumptions about how AGI will go.
I had a similar thought reading through your article and my gut reaction is that your setup can be made to work as-is with a more gradual takeoff story with more precedents, warning shots and general transformative effects of AI before we get to takeover capability, but its a bit unnatural and some of the phrasing doesn't quite fit.
Background assumption: Deploying unaligned AGI means doom. If humanity builds and deploys unaligned AGI, it will almost certainly kill us all. We won’t be saved by being able to stop the unaligned AGI, or by it happening to converge on values that make it want to let us live, or by anything else.
Paul says rather that e.g.
The notion of an AI-enabled “pivotal act” seems misguided. Aligned AI systems can reduce the period of risk of an unaligned AI by advancing alignment research, convincingly demonstrating the risk posed by unaligned AI, and consuming the “free energy” that an unaligned AI might have used to grow explosively
Eliezer often equivocates between “you have to get alignment right on the first ‘critical’ try” and “you can’t learn anything about alignment from experimentation and failures before the critical try.” This distinction is very important, and I agree with the former but disagree with the latter.
On his view (and this is somewhat similar to my view) the background assumption is more like, 'deploying your first critical try (i.e. an AGI that is capable of taking over) implies doom', which is saying that there is an eventual deadline where these issues need to be sorted out, but lots of transformation and interaction may happen first to buy time or raise the level of capability needed for takeover. So something like the following is needed:
On the Paul view, your three pillars would still eventually have to be satisfied at some point, to reach a stable regime where unaligned AGI cannot pose a threat, but we would only need to get to those 100 points after a period where less capable AGIs are running around either helping or hindering, motivating us to respond better or causing damage that degrades our response, to varying extents depending on how we respond in the meantime, and exactly how long we spend during the AI takeoff period.
Also, crucially, the actions of pre-AGI AI may push this point where the problems become critical to higher AI capability levels as well as potentially assisting on each of the pillars directly, e.g. by making takeover harder in various ways. But Paul's view isn't that this is enough to actually postpone the need for a complete solution forever: e.g. that the effects of pre-AGI AI could 'could significantly (though not indefinitely) postpone the point when alignment difficulties could become fatal'.
This adds another element of uncertainty and complexity to all of the takeover/success stories that makes a lot of predictions more difficult.
Essentially, the time/level of AI capability at which we must reach 100 points to succeed also becomes a free variable in the model that can move up and down, and we also have to consider the shorter-term effects of transformative AI on each of the pillars as well.
I don't think what Paul means by fast takeoff is the same thing as the sort of discontinous jump that would enable a pivotal act. I think fast for Paul just means the negation of Paul-slow: 'no four year economic doubling before one year economic doubling'. But whatever Paul thinks the survey respondents did give at least 10% to scenarios where a pivotal act is possible.
Even so, 'this isn't how I expect things to to on the mainline so I'm not going to focus on what to do here' is far less of a mistake than 'I have no plan for what to do on my mainline', and I think the researchers who ignored pivotal acts are mostly doing the first one
"In the endgame, AGI will probably be pretty competitive, and if a bunch of people deploy AGI then at least one will destroy the world" is a thing I think most LWers and many longtermist EAs would have considered obvious.
I think that many AI alignment researchers just have a different development model than this, where world-destroying AGIs don't emerge suddenly from harmless low-impact AIs, no one project gets a vast lead over competitors, there's lots of early evidence of misalignment and (if alignment is harder) many smaller scale disasters in the lead up to any AI that is capable of destroying the world outright. See e.g. Paul's What failure looks like.
On this view, the idea that there'll be a lead project with a very short time window to execute a single pivotal act is wrong, and instead the 'pivotal act' is spread out and about making sure the aligned projects have a lead over the rest, and that failures from unaligned projects are caught early enough for long enough (by AIs or human overseers), for the leading projects to become powerful and for best practices on alignment to be spread universally.
Basically, if you find yourself in the early stages of WFLL2 and want to avert doom, what you need to do is get better at overseeing your pre-AGI AIs, not build an AGI to execute a pivotal act. This was pretty much what Richard Ngo was arguing for in most of the MIRI debates with Eliezer, and also I think it's what Paul was arguing for. And obviously, Eliezer thought this was insufficient, because he expects alignment to be much harder and takeoff to be much faster.
But I think that's the reason a lot of alignment researchers haven't focussed on pivotal acts: because they think a sudden, fast-moving single pivotal act is unnecessary in a slow takeoff world. So you can't conclude just from the fact that most alignment researchers don't talk in terms of single pivotal acts that they're not thinking in near mode about what actually needs to be done.
However, I do think that what you're saying is true of a lot of people - many people I speak to just haven't thought about the question of how to ensure overall success, either in the slow takeoff sense I've described or the Pivotal Act sense. I think people in technical research are just very unused to thinking in such terms, and AI governance is still in its early stages.
I agree that on this view it still makes sense to say, 'if you somehow end up that far ahead of everyone else in an AI takeoff then you should do a pivotal act', like Scott Alexander said:
That is, if you are in a position where you have the option to build an AI capable of destroying all competing AI projects, the moment you notice this you should update heavily in favor of short timelines (zero in your case, but everyone else should be close behind) and fast takeoff speeds (since your AI has these impressive capabilities). You should also update on existing AI regulation being insufficient (since it was insufficient to prevent you)
But I don't think you learn all that much about how 'concrete and near mode' researchers who expect slower takeoff are being, from them not having given much thought to what to do in this (from their perspective) unlikely edge case.
Update: looks like we are getting a test run of sudden loss of supply of a single crop. The Russia-Ukraine war has led to a 33% drop in the global supply of wheat: