Offer the world's most intelligent/capable/qualified people a ridiculous amount of money to work on AGI Alignment as part of a prestigious fellowship
Greg Colbourn, Nov 2021 - Nov 2022
[Publication note (30 Mar 2023): A lot has changed since I last worked on this document. I was on the verge of publishing it here, then FTX happened, and it seemed quite inappropriate in the aftermath of that to be talking about spending so much money. Since then, we've had chatGPT and now GPT-4. This has caused me to update to thinking that it's too late for a plan like this to be our best option for dealing with AGI risk, and actually, the priority is a global moratorium on AGI development (and the Pause letter and the Yudkowsky TIME article have been most welcome developments along these lines). I still think something like what is outlined here should be done (and I'd be excited if others picked it up and ran with it), but more urgent is getting the global moratorium in place. I'm publishing now, in the interest of getting the ideas out there, rather than holding off for another round of edits to get the piece up to date, which I may never get round to. I haven't made any edits to anything in the main text since November 4 2022.
Added 31 Mar 2023: I should also say that before considering publishing this, I exhausted many lines of private inquiry, contacting people and orgs in a good position to get something like this off the ground (as I say in the main text, this needs to be prestigious). Alas, there wasn't sufficient interest. No one willing to take it and run with it. No one with similar plans already. I hope something like this happens soon (and I'm more optimistic it will now, given recent events). If it is being thought about, I hope this FAQ can be of use.]
This document is the result of iteration via discussions over the last year with people in the AGI x-risk, EA and rationality communities. It is written in an FAQ-style, as I have tried to address people’s questions, concerns, comments and feedback. I extensively link to further discussion where public, but try to summarise key points within the text.
Epistemic Status: I still consider this to be very high EV, although there is significant downside risk (potential for unintended AI capabilities enhancement). On balance, given already short timelines, I think we should at least empirically test the idea.
Aligning future superhuman AI systems is arguably the most difficult problem currently facing humanity; and the most important. In order to solve it, we need all the help we can get from the very best and brightest. To the extent that we can identify the absolute most intelligent, most capable, and most qualified people on the planet – think Fields Medalists, Nobel Prize winners, foremost champions of intellectual competition, the most sought-after engineers... or whomsoever a panel of the top AGI Alignment researchers would have on their dream team – we should aim to offer them salaries competitive with top sportspeople, actors and music artists to work on the problem, as part of an exclusive and highly prestigious fellowship. This would be complementary to prizes, in that getting paid is not dependent on results. The pay is for devoting a significant amount of full time work (say a year), and maximum brainpower, to the problem; with the hope that highly promising directions in the pursuit of a full solution will be forthcoming. We should aim to provide access to top AI Alignment researchers for guidance, affiliation with top-tier universities, and an exclusive retreat house and office for fellows of this program to use, if so desired. An initial two week retreat could be offered as an introduction, with further offers based on interest and engagement. At the very least, given shortening timelines, this idea should undergo extensive empirical testing.
Key points from FAQ:
- Time until AI poses a significant existential threat is likely short (but even if not, given ITN, this idea could be worth it).
- Prestige can attract talent.
- Money can attract talent; it can be spent on more than personal consumption, and can be a big motivator even for top mathematicians.
- Nerd-sniping may not be required (c.f. Scott Aaronson).
- World’s best talent can be identified using a number of approaches.
- Goodharting on prior accomplishments (“tails come apart”, Matthew Effect) is an issue to bear in mind, but not a defeater.
- The field of AGI Alignment is pre-paradigmatic so not too time-consuming to break into.
- A short retreat, followed by a longer fellowship if fruitful, could be the ideal format.
- Fellowship should be remote for maximum flexibility, but with an open house for optional in-person discussion. Thought should be given to potential collaborators amongst fellows.
- Backfire risk of increasing AGI capabilities research is significant.
- Low signal:noise, and missing the main target (of x-safety) are things to be cognisant of.
- Needs someone high up to take it on, but a fallback option could be crowdsourcing funds for an initial offer to an individual.
- AGI x-safety is bigger than AGI Alignment; this could be extended to AGI Governance and strategy.
- A prestigious org and/or someone with a lot of clout (top academic, charismatic billionaire, high-ranking government official) to take it and run with it.
- Find out more about Demis Hassabis’ “Avengers assemble” plan / encourage him to get started asap if he hasn’t already.
- Get Vitalik Buterin’s input (it’s possible he’d want to back something like this).
- Survey top AGI Alignment researchers, asking who they would pick for their dream team.
What is the idea?
Offer the world's absolute most intelligent/capable/qualified people (such as topmost mathematicians like Terence Tao or topmost programmers like Gennady Korotkevich) a lot of money to work on AGI Safety. Say $5M to work on the problem for a year. Perhaps have it open to any Fields Medal recipient. This could be generalised to being offered to anyone a panel of top people in AGI Safety would have on their dream team (who otherwise would be unlikely to work on the problem), with contract lengths as short as 3 months, and money as much as $1B. Note that it would be invite only.
Where have I seen this before?
Previously posted: EA Forum comments and question, LessWrong comment (and this from 2014), Facebook comment on MIRI page; more recent discussion on LessWrong, ACX and the EA Forum. Also discussed on AGISF and AI Alignment Slacks, the EleutherAI Discord and multiple Google Doc drafts. And submitted to the FTX project ideas competition. Previously this was titled “Megastar Salaries for AI Alignment work” and “Mega-money for mega-smart people to solve AGI Alignment” [mega-money = $Ms; mega-smart = people at or above a 1 in a million level of intelligence].
Isn’t this crazy?
It may seem so, on the face of it. But, if you take the idea of AI timelines being short seriously (see next question), surely it’s worth a shot? This is especially so, given the ongoing massive imbalance between AI capabilities and AI alignment research. In some respects it’s surprising that it hasn’t been done already. Also, consider that Effective Altruism (EA) currently has a lot of money to deploy, and is in search of megaprojects. Perhaps this could be thought of as a (precursor to a) Manhattan Project for AGI Alignment.
There is already a global talent search underway for people to work on the AGI Alignment problem, with programs such as SERI-MATS, Conjecture's Refine, Redwood’s Remix and MLAB, ML Safety Scholars, CAIS Philosophy Fellowship, PIBBSS, AI Safety Camp, and the Atlas fellowships. The ultimate of this is aiming for the absolute top. The absolute very best talent in the world.
“To the extent we can identify the smartest people on the planet, we would be a really pathetic civilization were we not willing to offer them NBA-level salaries to work on alignment.” - Tomás B.
(Why are your AI timelines short?
Some recent developments that have made timelines – time until AI poses a significant existential threat – shorter by my estimation include Deepmind’s Algorithm Distillation, Gato, Chinchilla, Flamingo, AlphaCode and AlphaTensor; Google's Pathways, PaLM, SayCan, Socratic Models and TPUs; OpenAI’s DALL-E 2, followed just 6 months later by Meta’s Make-A-Video and Google’s Imagen Video and Phenaki; SalesForce’s CodeGen (note that several different companies are advancing the state of the art); EfficientZero; Cerebras; Ajeya Cotra’s shortening timelines and MIRI's increasing pessimism [maybe this project could be seen as an example of playing one’s outs]; on top of the already worrying AlphaGo, AlphaZero, AlphaFold and GPT-3. I also think that “crunch time” should be operationalised as >10% chance of AGI in <10 years, given the stakes.)
(What if my timelines aren’t short?
Even if your timelines don’t have significant probability mass (>10% chance of AGI) in the next couple of decades, you may agree that it is perhaps the most important problem faced by humanity this century. And that would justify having the best brains on the planet working on it.)
Ok, so maybe it’s not totally crazy. But how will we get the people that matter (i.e. the potential grantees) to take the idea seriously?
Prestige. A recent report into innovation prizes found “recognition prizes.. to be more promising than.. anticipated”. The offer could be formulated in terms of a highly prestigious academic fellowship. Perhaps by CHAI, FLI, FHI or CSER, or a combination of such institutions, and associated world-class universities (Berkeley, MIT, Oxford, Cambridge). Highly prestigious academics would also need to be involved. And perhaps cultural bridges to different disciplines will need to be built up over some months via connections to respected people sympathetic to the mission. To some extent, a certain level of prestige for the entire field of AGI Alignment is likely needed, making this somewhat of a chicken-and-egg problem. I don’t think it is a non-starter for this reason though.
Then again, perhaps the amounts of money just make it too outrageous to fly in academic circles? (This idea is, arguably, outside the Overton Window. Or is it? Now that the FTX Future Fund have announced their AI Worldview Prize with $1.5M top prizes!). Maybe we should be looking to things like sports or entertainment instead, where such amounts of money are normalised. Compare the salary to that of e.g. top footballers, musicians, or indeed, NBA players. Are there people high up in these fields who are concerned about AI x-risk?
(If it was from, say, a lone crypto millionaire, they might risk being dismissed as a crackpot, and by extension risk damaging the reputation of AGI Safety. Such “poisoning of the well” by way of unilateral action is why I’ve been cautious with not sharing this too widely so far).
But the people targeted by this grant mostly aren’t interested in money..
Perhaps so. Mostly they are motivated by curiosity, and have different interests, so why would they pursue AGI Alignment for mere money? Many people say they aren't motivated by money, but how many of them have seriously considered what they could do with it other than personal consumption? And how many have actually been offered a lot of money -- to do something different to what they would otherwise do, that isn't immoral or illegal -- and turned it down? What if it was a hundred million, or a billion dollars? Or, what if the time commitment was lower - say 6 months, or 3 months? Maybe a good way of framing it would be as a sabbatical (although of course offering longer tenure could also be an option if that is preferred).
I imagine that they might not be too motivated by personal consumption (given that most of them are likely to be financially comfortable already, given the value of their talents), but with enough cash they could forward goals of their own. If they'd like more good maths to be done, they could use the money to offer scholarships, grants, and prizes -- or found institutes -- of their own. (If $5M isn't enough -- I note Tao at least has already won $millions in prizes -- I imagine there is enough capital in the community to raise a lot more. Let them name their price.)
Maybe the limiting factor is just the consideration of such ideas as a possibility? When I was growing up, I wanted to be a scientist, liked space-themed Sci-Fi, and cared about many issues in the world (e.g. climate change, human rights); but I didn't care about having or wanting money (in fact I mostly thought it was crass), or really think much about it as a means to achieving ends relating to my interests. It wasn't until reading about (proto-)EA ideas that it clicked.
“I suspect that the offer would at least capture his [Tao’s] attention/curiosity. Even if he rejected the offer, he'd probably find himself curious enough to read some of the current research. And he'd probably be able to make some progress without really trying.” - casebash
What about their existing careers and the investments that went into those?
If you’ve invested your career in one field already, you might be unlikely to feel incentivised by a problem which people are saying is "more important" unless you have inroads to applying your existing expertise in that direction [H/T Morgan]. This may be true for many, but others may be more open to at least the idea that they should reprioritise (to the point of writing off sunk costs). Also, for the people we’d likely want to target, some of their existing expertise could be relevant.
What about their existing employers?
If needed, they could be kept happy by offering an endowment to them to go along with the grant to their employee. For example, UCLA could be given $5M to go along with $10M for Tao.
But surely hedge funds and top companies have offered these people tons of money to work for them already?
Maybe (and presumably they’ve turned them down). And there are some examples of uptake. Huawei has recruited 4 Fields medalists (in addition to a huge amount of world class talent in general); although note that they are all continuing to work on areas very closely related to their earlier research, rather than switching fields [H/T David]. High-ranking mathematician Abhinav Kumar left academia to work at Renaissance Technologies [H/T SP], and indeed Jim Simons effectively recruited a top level Mathematics department for Ren Tech. (It would be interesting to know more about these cases; perhaps they are motivated by personal consumption.) But have non-profits with a pro-social mission and large amounts of prestige done so?
How else can we turn (lots of) money into AGI Alignment work done by the very best and brightest?
We could hire top headhunting talent (this would be more efficient than the baseline proposal if headhunter salary + headhunted salary < baseline offered salary). But note that this is quite a different ask to what is usually common for a headhunter: we are asking them to find people to work on a problem with no known solution (perhaps there are equivalents in Maths, but I don’t think there has been any such headhunting done for these, just prizes). [Some recent discussion of hiring in the AGI Alignment space here.] We can offer large prizes (FTX Future Fund is already proposing this). Or we can pay people to work on formulating problems to nerd-snipe top people with (see below).
Maybe we should just ask them what it would take to get them working on Alignment?
Yes! A survey on this could be a good first step (although I imagine reaching such top people with a survey isn’t easy, so this would have to rely on leveraging personal connections and networks). It may even be an easy way to get them to actually consider the problem seriously. Interestingly, Scott Aaronson wrote earlier this year that it would take “nerd-sniping” on a technical problem to get him to devote time to working on Alignment. This was somewhat surprising to me as I would’ve thought that he would be one to come up with specific technical problems himself (I note he is acknowledged on Roman Yampolskiy’s paper here so is not unfamiliar with the ideas)! Another mention of nerd-sniping here.
Update: Scott Aaronson is now working on Alignment! On the face of it, it looks like all it took was a serious offer (by Jan Leike at OpenAI), that was initially prompted by a couple of commenters on his blog asking him “point-blank what it would take to induce [him] to work on AI alignment”(!) (Although clearly Aaronson had already been aware of AI Alignment for a long time).
(If nerd-sniping is required, how can we do that?
I would hope that a sufficiently large and prestigious offer would be enough to provoke some serious engagement with the issue and a certain degree of auto-nerd-sniping would result. But, failing that, specific open problems along the lines of ARC’s ELK, or MIRI’s Visible Thoughts Project could help.
Part of the job of an affiliated prestigious academic institution associated with this project could be formulating such problems. More broadly, raising the profile and perceived importance of AGI x-risk in general could help to draw the desired talent in (note the Covid-related banner on Terrence Tao’s blog). See also the answers to “What are the coolest topics in AI safety, to a hopelessly pure mathematician?”.)
In what ways is this different from other problems, like climate change?
There are many big problems in the world, some of which are widely recognised, like climate change. Have there been concerted efforts to recruit the world’s most intelligent people to tackle these? Maybe, but none have been high profile enough for me to have heard of them. Whilst both AI Alignment and climate change require high levels of global coordination, arguably AI Alignment is more amenable to being solved by a lone genius. Even if a breakthrough energy source is discovered (by a lone genius or otherwise), it will still need to be rolled out into widespread use, which will take a lot more effort by a lot more people. Whereas with AI Alignment, all that could be needed is the proof being accepted by a small group of core people, and its implementation in a single AGI system (i.e. the first).
Is Pure Maths ability what we want? How do we identify the World’s best talent for the job?
There has been some debate about the importance of Pure Maths ability, but this can be largely bypassed if we generalise to “anyone a panel of top people in AGI Safety would have on their dream team (who otherwise would be unlikely to work on the problem)”, or “Fields Medalists, Nobel Prize winners in Physics, other equivalent prize recipients in Computer Science, or Philosophy(?) or Economics(?)”, or those topping world ranking lists. Perhaps we could include other criteria, such as a “track record of multiple field-changingly-important research accomplishments” (H/T Sam), or being able to grok what is being alluded to here. Other ideas for identifying the world’s best talent: the most highly paid engineers or researchers; people scoring at the very top end in standardised tests; winners of top computer programming, maths or physics competitions/olympiads; the world's best (board and video) games players. Basically the idea is to have the offer open to the world’s most qualified/capable/intelligent people.
Aren’t most of the people who’ve won the biggest prizes (such as Nobels and Fields Medals) too old?
Crystallised Intelligence vs fluid intelligence: people who have already won prestigious prizes tend to be older, so may have less fluid intelligence, making them less likely to be able to adjust to thinking in terms of new paradigms. Ideally we want the kinds of people who will go on to win such prizes, but whilst they are still young! But then again, a background of lots of technical knowledge is also useful, and could be more than enough to compensate.
What does the evidence look like for people who have already won big prizes going on to do further revolutionary work? This is something worth looking into, but we don’t have to be fixated on prizes, it can also be “panel of top AGI Safety researchers pick dream team”. Note that this also solves half of the problem outlined here (having existing experts pick the dream team gets around the issue of not being able to buy expertise unless you are an expert yourself. The other half is aligning incentives, which might best be done non-monetarily).
Another idea could be to approach the grad students of top prize winners, particularly those who take on very few grad students (more selective) [H/T L].
Aren’t we in danger of Goodharting on prior accomplishments here?
Yes, although again the generalisation of the idea is the “dream team” as decided by the current top AI Alignment experts. Stellar accomplishments can be attributed in part to luck (being in the right place at the right time), or the Matthew Effect, but that’s not to say that some amount of genius is not required. We are still more likely to have a higher success rate if directing our hits-based giving at those who have already achieved greatness, vs those who have not. This is somewhat analogous to entrepreneurship, where prior success has some bearing on future success. Extreme accomplishment is fragile, and described by the “tails come apart” phenomena. Foremost basketball player Michael Jordan was unable to become a top-tier baseball player [H/T Dan]. But as the previously linked post says: “[t]his probably has limited practical relevance. Although you might expect that one of the 'not estimated as the very best' [potential hire] is in fact better than your estimated-to-be-best [potential hire], you don't know which one, and your best bet remains your estimate [based on your proxy for ability]”. There is also some evidence that the top performing people are somewhat polymathic. This is encouraging for our purpose of getting top performers from other fields into AGI Alignment. But how often is the demonstrated polymathy top tier across more than one field?
Seriously, “you cannot just pay $5 million apiece to a bunch of legible geniuses from other fields and expect to get great alignment work out of them.”(!)
Yudkowsky: “"Geniuses" with nice legible accomplishments in fields with tight feedback loops where it's easy to determine which results are good or bad right away, and so validate that this person is a genius, are (a) people who might not be able to do equally great work away from tight feedback loops, (b) people who chose a field where their genius would be nicely legible even if that maybe wasn't the place where humanity most needed a genius, and (c) probably don't have the mysterious gears simply because they're rare. You cannot just pay $5 million apiece to a bunch of legible geniuses from other fields and expect to get great alignment work out of them. They probably do not know where the real difficulties are, they probably do not understand what needs to be done, they cannot tell the difference between good and bad work, and the funders also can't tell without me standing over their shoulders evaluating everything, which I do not have the physical stamina to do.”
Maybe this is true – the above is some evidence that it may be difficult to identify the absolute best talent in the world to work on AI Alignment – but I think we are well past the stage where this should at least be established by significant empirical evidence. We should at least try, whilst we have the opportunity. I would feel a lot better to be in a world where a serious attempt was made at this project (to the point where I’m willing to personally contribute a significant fraction of my own net worth toward it).
The way I see it is that we need to throw all we have at both AGI Alignment research and AGI governance. On the research front, getting the most capable people in the world working on the problem seems like our best bet. (On the governance front, it would be something along the lines of global regulation of, or a moratorium on, AGI capabilities research. That seems harder.)
Won’t they need a lot of time to get up to speed?
Arguably not, given that AI Alignment is still a young field. And we don’t necessarily need them to familiarise themselves with all the existing approaches. Arguably the main benefit will be them coming up with novel ideas and perspectives, which might best be done from looking at things with fresh eyes, having got a basic-moderate level understanding of the problem. Perhaps just spending a week going through the core readings of the AGISF curriculum would be sufficient. Given the pre-paradigmatic state of the field, one shouldn’t expect that typical graduate degree levels of investment are needed to get up to speed.
If not coming up with novel ideas, they may be able to find significant efficiency gains to existing research by performing a deep dive into one to-them-promising area. This should not require a too great time burden in terms of spinning-up.
Won’t they need supervising?
Maybe all we need to do is get them to grok the scale of the problem. The idea is that they are smarter than us, so it seems silly to try and supervise them as if they were junior researchers. One analogy is that of superintelligence (this is an exaggeration, but a useful intuition pump; the smartest people on the planet are the closest we have). A superintelligence by definition would be able to advance any research field it aimed its cognition at. Rather than supervision, it should be more like - "make a sincere effort and report your progress (on the AI Alignment Forum, or in private if need be); we can offer guidance / feedback / act as a sounding board for you to bounce ideas off".
Top AGI Alignment researchers can also offer to be a sounding board / clarify things to the extent they are clarifiable - i.e. get them on the same page (with a variety of perspectives if need be). Perhaps there could be conversations hosted similar to these.
What about publishing incentives?
There are 2 potential risks here:
- Incentive to publish for the sake of academic prestige: tendency for incremental results that are “low-quality” or “low impact” in the grand scheme of solving the Alignment Problem, even if they produce decent amounts of citations and a boost to one's h-index. (It’s worth noting that for fellows with established careers, the incentive to accumulate publications is much diminished as it’s no longer required for instrumental reasons related to career advancement.)
- Infohazards. Published results may contain information that, in addition to possibly furthering AGI safety, could help accelerate AGI in a default-dangerous way if used selectively/incorrectly/with sole regard to increasing capabilities.
Given this, perhaps it should be made clear that there are no expectations for public output (short of an eventual safe AGI that may be resultant somewhere -- possibly long -- down the line). To ensure that actual work is done, the recipient would meet with an individual/panel from the granting committee to discuss work / share notes/letters with them (in strictest non-disclosure-bound confidence). There could be a problem with this model’s compatibility with the grant being awarded by an academic institution (as suggested for prestige/credibility reasons above).
How should this be organised?
Given that fellows will already have responsibilities and projects that they are committed to (as well as job security, in the case of tenured faculty)[H/T Morgan], the fellowship could be organised as a sabbatical (H/T Misha), liaising with the fellows’ institution to ensure that their existing job remains available after.
Perhaps a short retreat would be best to start. Say, two weeks all expenses paid to a nice location, where fellows can bring their family along for a holiday. The first week could be spent studying assigned readings, and the second week interacting with top people from the AGI Alignment community, both formally in seminars or brainstorming sessions, and informally at dinners [H/T Nick]. People showing a genuine interest and strong engagement at the end of the two weeks could then be offered a longer sabbatical.
Note that organising this as a stand alone fellowship would largely mitigate the distortionary effects of a large differential in salaries within an organisation were this to be done via conventional hiring (as discussed here).
Regarding nerd-sniping and performance-related rewards, maybe the fellowship could be combined with a series of prizes. E.g. $5M to work on a well-specified sub problem for a year, and $5M for a solution [H/T Trevor1].
Top AGI Safety researchers’ time is highly valuable, is this a good use of it?
Yes, their time is highly valuable, but unless they think they can actually see their way through to a full solution (and by the sounds of it, no one yet can), then it's not more valuable than helping someone who might have a better shot at seeing their way through to a full solution. (The whole reason for hiring them being that they hopefully will have a better shot at it).
Is hiring lone geniuses the way to go, or should we be thinking about finding people who work well together?
Some consideration should be given to how Fellows might work together in both synergistic and complementary ways. Perhaps this can be somewhat self-organised by the fellows themselves (i.e. through recommendations / hiring of friends and colleagues); or best practices from the likes of Bell Labs or the Manhattan Project could be followed.
Where would this project be located?
I’m thinking largely remotely, for flexibility around family, existing responsibilities etc [note that this is also a good idea generally, to expand the talent pool - there is currently no remote-only AI Alignment org]. Give fellows passes to all the major orgs - FHI, GPI, CHAI, CSER, MIRI, OpenPhil, FTX Foundation - and all the conferences/workshops, EAGs etc, but don't make attendance anywhere mandatory. Give them "the key to the city" as it were, and allow them to travel wherever they want (including staying at home working remotely). Note that this would require significant vetting to allay security concerns (perhaps access to all relevant orgs will be impractical, but significant freedom of movement between orgs could be achievable). Perhaps also have a house dedicated to the fellowship, where fellows can drop in / stay on an ad hoc basis (H/T Misha), as per the Einstein Fellowship (but without residency being expected).
Is this cost-effective?
It’s a lot of money per hour of thinking time being bought. But EA now has a lot of money. And the stakes are as high as they get. And we are talking about the highest quality thinking available in the known universe. Still, maybe you’d get more bang for your buck selecting the most promising students (who’ve yet to prove themselves), but many organisations are already doing this. It seems that the biggest gap in the market is at the highest possible end.
Could this backfire?
Yes, if it attracts super-smart people who aren’t particularly altruistic, who see it as an easy way to get lots of money with no expectation of results [H/T Rosie]. Some amount of vetting will be required in this regard. However, despite this, the seemingly-altruistic supersmart people that do get hired could get a taste for riches, and then maybe be corrupted into thinking that AGI capability research is a good way of getting yet more riches (/prestige). Or such thoughts could bias them in the direction of downplaying AGI x-risk. Or, of course, they could come to the conclusion that AGI x-risk isn’t that bad independently, and from there go on to thinking AGI development is a good idea. This could be a defeater. And there are already precedents with the likes of OpenAI. One way to mitigate against this could be to have an extended initial selection process where people are asked to develop a rough research plan before starting the fellowship proper. We'd then only continue if it's clear that they're actually taking aim at a core X-risk-related problem and seem to get the stakes (H/T Sam).
Another risk in this vein is that the amount of money and prestige involved attracts (even) more investment into AGI (capabilities) in general. This may not be significant, given the already large amounts of money going into AGI capabilities research, especially given the ratio of safety:capabilities funding would be moved more in favour of safety by this program. (Note that to some extent, there is not a clear distinction between capabilities and alignment research. i.e. systems have to have some amount of alignment to work at all. What I mean here by “capabilities research” is research that doesn’t take heed of x-risk from misalignment, including the possibility of inner-misalignment. This includes most of the AGI research being done at the world’s biggest tech companies.)
How else could it go wrong?
Another failure mode is the production of irrelevant research; research that appears clever, and which everyone feels the need to take seriously, but misses the main target of reducing existential risk from AGI (as per the publishing incentive risk identified above), and therefore reduces the signal to noise ratio of the field (and incentivises further reductions). This is more likely to happen when selecting people not on the basis of knowledge of and interest in the alignment problem (i.e. this plan) [H/T Daniel]. I think this is a risk, but given urgency we should be willing to tolerate significant levels of “fake” research in order to get more of the highest possible level of real research. It could be somewhat ameliorated by requesting fellows take heed of (something like) “Yudkowsky's "list of lethalities".
Finally, if not done in a socially sensitive way, it could induce panic or otherwise poison the well for the AGI Safety field.
But there could be good second-order effects too, right?
Yes, such large grants should draw a lot of attention, and status, to the problem of AGI Alignment, helping to raise its profile to the point where many more smart people are helping to solve it.
What should this be called?
I think, something more prestigious sounding than “Mega-money for mega-smart people to solve AGI Safety”, “Megastar salaries for AI alignment work” or “World’s best for AGI Alignment”; perhaps something like the AGI Good Fellowship? After I. J. Good, one of the first people to draw attention to the issue of superintelligent AI; but also because we want the AGI to be good!
How do we get it off the ground?
Ideally I want someone/an org with a lot of clout to take it and run with it. If this is you, let me know! Perhaps a good first step would be to survey people in Alignment and ask them who they’d have on their “dream team” if they could hire anyone in the world (and then, ideally, survey those people to ask them what it would take to get them to work on AGI Alignment).
So far, from private messaging, I’ve had a few prominent people express some interest in this idea, but no one yet willing to take it on. An alternative route to getting traction could be to fundraise first, and then approach Terrence Tao (or another) with an offer. If the reply is at least a “maybe”, then that could be enough to get high profile institutions on board to bolster the appeal. One avenue for fundraising could be via the crypto community with a DAO (with a catchy name such as the TerryTaoDAO [H/T Elliot]). This is not without precedent, but should be regarded as very much a Plan B, given that the reputation of crypto is not great amongst scientists, and the history of DAOs to date is also short of plain sailing. Although an easier route could just be finding a few individuals willing to commit the required funding (e.g $10M). As mentioned earlier, regarding Unilateralist Curse considerations, better would be to get prestigious institutional buy-in to start.
Demis Hassabis mentioned such a plan for when the time comes recently in the Deepmind Podcast (although they don’t directly talk about x-risk). I worry that there won’t be a good enough “fire alarm” for this “grey zone" though, and that really, the “Avengers” should be assembled a.s.a.p. (at least for a short retreat to start).
Hold on, the goal we are aiming for is existential safety from AGI, and this is bigger than just Alignment. You mention governance above, what about that?
Yes. First a note on terminology. This field has, over the years, been referred to as Friendly AI (FAI), AI Safety, and AI Alignment. I’ve used AGI Alignment throughout above, as Artificial General Intelligence is more specifically what we are talking about. But perhaps ASI (Artificial Superintelligence) is even more appropriate, given that the risk is mainly from systems that can outsmart all of humanity, and the reason AGI poses an x-risk is because it can (easily) become ASI. More recently, the distinct field of AI Governance has emerged, which can be thought of as “the other half of the problem”, i.e. how to ensure there is global cooperation on AI Alignment. Both of these together could be grouped under the term existential safety, or x-safety (i.e. safeguarding humanity, and the entire future lightcone, from existential risk). What we are really concerned with here is ASI x-safety. So to this end, we should be open to fellows doing any kind of work – alignment, governance, community building, strategy, or other – that moves the needle on ASI x-safety. Regarding “other”, one concerning possibility is that ASI Alignment may be impossible. If this were the case, then governance, with the goal of preventing ASI, is all we have. Having a fellow or two from this program working on ASI alignment impossibility proofs and/or their refutations would have very high value of information. And perhaps the nature of formulating theorems and proofs around this would appeal to the top mathematicians. But obviously solving ASI Alignment would be best! I hope that we can move the needle on that with a program like this.