Helping lead EA stuff at Harvard and doing AI governance research.

Topic Contributions


(Even) More Early-Career EAs Should Try AI Safety Technical Research

Hmm. I don't have strong views on unipolar vs multipolar outcomes, but I think MIRI-type thinks Problem 2 is also easy to solve, due to the last couple clauses of your comment.

(Even) More Early-Career EAs Should Try AI Safety Technical Research

Edited the post substantially (and, hopefully, transparently, via strikethrough and "edit:" and such) to reflect the parts of this and the previous comment that I agree with.

Regarding this:

I don't see many risk scenarios where a technical solution to the AI alignment problem is sufficient to solve AGI-related risk. For accident-related risk models (in the sense of this framework) solving safety problems is necessary. But even when technical solutions are available, you still need all relevant actors to adopt those solutions, and we know from the history of nuclear safety that the gap between availability and adoption can be big — in that case decades. In other words, even if technical AI alignment researchers somehow "solve" the alignment problem, government action may still be necessary to ensure adoption (whether government-affiliated labs or private sector actors are the developers).

I’ve heard this elsewhere, including at an event for early-career longtermists interested in policy where a very policy-skeptical, MIRI-type figure was giving a Q&A. A student asked: if we solved the alignment problem, wouldn’t we need to enforce its adoption? The MIRI-type figure said something along the lines of:

“Solving the alignment problem” probably means figuring out how to build an aligned AGI. The top labs all want to build an aligned AGI; they just think the odds of the AGIs they’re working on being aligned are much higher than I think they are. But if we have a solution, we can just go to the labs and say, here, this is how you build it in a way that we don’t all die, and I can prove that this makes us not all die. And if you can’t say that, you don’t actually have a solution. And they’re mostly reasonable people who want to build AGI and make a ton of money and not die, so they will take the solution and say, thanks, we’ll do it this way now.

So, was MIRI-type right? Or would we need policy levers to enforce adoption of the problem, even in this model? The post you cite chronicles how long it took for safety advocates to address glaring risks in the nuclear missile system. My initial model says that if the top labs resemble today’s OpenAI and DeepMind, it would be much easier to convince them than the entrenched, securitized bureaucracy described in the post: the incentives are much better aligned, and the cultures are more receptive to suggestions of change. But this does seem like a cruxy question. If MIRI-type is wrong, this would justify a lot of investigation into what those levers would be, and how to prepare governments to develop and pull these levers. If not, this would support more focus on buying time in the first place, as well as on trying to make sure the top firms at the time the alignment problem is solved are receptive.

(E.g., if the MIRI-type model is right, a US lead over China seems really important: if we expect that the solution will come from alignment researchers in Berkeley, maybe it’s more likely that they implement it if they are private, "open"-tech-culture companies, who speak the same language and live in the same milieu and broadly have trusting relationships with the proponent, etc. Or maybe not!)

(Even) More Early-Career EAs Should Try AI Safety Technical Research

Agreed, these seem like fascinating and useful research directions.

(Even) More Early-Career EAs Should Try AI Safety Technical Research

Thanks, Locke, this is a series of great points. In particular, the point about even fewer people (~25) doing applied policy work is super important, to the extent that I think I should edit the post to significantly weaken certain claims. Likewise, the points about the relative usefulness of spending time learning technical stuff are well taken, though I think I put more value on technical understanding than you do; for example, while of course policy professionals can ask people they trust, they have to somehow be able to assess the judgment of these people on the object-level thing. Also, while I think the idea of technical people being in short supply and high demand in policy is generally overrated, that seems like it could be an important consideration. Relatedly, it seems maybe easier to do costly fit-tests (like taking a first full time job) in technical research and switch to policy than vice versa. Edit: for the final point about risk models, I definitely don't have state funding for safety research in mind; what I mean is that since I think it's very unlikely that policy permanently stops AGI from being developed, success ultimately depends on the alignment problem being solved. I think there are many things governments and private decision-makers can do to improve the chances this happens before AGI, which is why I'm still planning on pursuing a governance career!

Why AGI Timeline Research/Discourse Might Be Overrated

It's hard for me to agree or disagree with timeline research being overrated, since I don't have a great sense of how many total research hours are going into it, but I think Reason #4 is pretty important to this argument and seems wrong. The goodness of these broad strategic goals is pretty insensitive to timelines, but lots of specific actions wind up seeming worth doing or not worth doing based on timelines. I find myself seriously saying something like "Ugh, as usual, it all depends on AI timelines" in conversations about community-building strategy or career decisions like once a week.

For example, in this comment thread about whether and when to do immediately impactful work versus career-capital building, both the shape and the median of the AI x-risk distribution winds up mattering. A more object-level consideration means that "back-loaded" careers like policy look worse relative to "front-loaded" careers like technical research insofar as timelines are earlier.

In community-building, earlier timelines generally supports outreach strategies more focused on finding very promising technical safety researchers; moderate timelines support relatively more focus on policy field-building; and long timelines support more MacAskill-style broad longtermism, moral circle expansion, etc.

Of course, all of this is moot if the questions are super intractable, but I do think additional clarity would turn out to be useful for a pretty broad set of decision-makers -- not just top funders or strategy-setters but implementers at the "foot soldier" level of community-building, all the way down to personal career choice.

(Even) More Early-Career EAs Should Try AI Safety Technical Research

Yes! This is basically the whole post condensed into one sentence

(Even) More Early-Career EAs Should Try AI Safety Technical Research

Thanks for this -- the flaw in using the point estimate of 20-year timelines (and on the frequency and value of promotions) in this way occurred to me, and I tried to model it with guesstimate I got values that made no sense and gave up. Awesome to see this detailed model and to get your numbers!

That said, I think the 5% annual chance is oversimple in a way that could lead to wrong decisions at the margin for the trade-off I have in mind, which is "do AI-related community-building for a year vs. start policy career now." If you think the risk is lower for the next decade or so before rising in the 2030s, which I think is the conventional wisdom, then the 5% uniform distribution incorrectly discounts work done between now and the 2030s. This makes AI community-building now, which basically produces AI technical research starting in a few years, look like a worse deal than it is and biases towards starting the policy career. 

Run For President

I support some people in the EA community taking big bets on electoral politics, but just to articulate some of the objections:

solving the "how to convince enough people to elect you president" problem is probably easier than a lot of other problems

Even compared to very difficult other problems, I'm not sure this is true; exactly one person is allowed to solve this problem every four years, and it's an extremely crowded competition. (Both parties had to have two debate stages for their most recent competitive cycles, and in both cases someone who had been a famous public figure for decades won.)

And even if you fail to win, even moderately succeeding provides (via predictable media tendencies) a far larger platform to influence others to do Effective things.

It provides a larger platform, but politics is also an extremely epistemically adversarial arena: it is way more likely someone decides they hate EA ideas if an EA is running against a candidate they like. In some cases this trade-off is probably worth it; you might think that convincing a million people is worth tens of millions thinking you're crazy. But sometimes the people who decide you're crazy (and a threat to their preferred candidates) are going to be (e.g.) influential AI ethicists, which could make it much harder to influence certain decisions later.

So, just saying - it is very difficult and risky, so anyone considering working on this needs to plan carefully!

(Even) More Early-Career EAs Should Try AI Safety Technical Research

Thanks for these points, especially the last one, which I've now added to the intro section.

(Even) More Early-Career EAs Should Try AI Safety Technical Research

With "100-200" I really had FTEs in mind rather than the >1 serious alignment threshold (and maybe I should edit the post to reflect this). What do you think the FTE number is?

Load More