Studying behaviour and interactions of boundedly rational agents, AI alignment and complex systems.

Research fellow at Future of Humanity Institute, Oxford. Other projects: European Summer Program on Rationality. Human-aligned AI Summer School. Epistea Lab.


Learning from crisis

Topic Contributions


Impact markets may incentivize predictably net-negative projects

If the main problem you want to solve is "scaling up grantmaking", there are probably many other ways how to do it other than "impact markets". 

(Roughly, you can amplify any "expert panel of judges" evaluations with judgemental forecasting.)

On Deference and Yudkowsky's AI Risk Estimates

(i.e. most people who are likely to update downwards on Yudkowsky on the basis of this post, seem to me to be generically too trusting, and I am confident I can write a more compelling post about any other central figure in Effective Altruism that would likely cause you to update downwards even more)

My impression is the post is somewhat unfortunate attempt to "patch" the situation in which many generically too trusting people updated a lot on AGI Ruin: A List of Lethalities  and Death with Dignity  and subsequent deference/update cascades. 

In my view the deeper problem here is instead of disagreements about model internals, many of these people do some sort of "averaging conclusions" move, based on signals like seniority, karma, vibes, etc. 

Many of these signals are currently wildly off from truth-tracking, so you get attempts to push the conclusion-updates directly. 


What’s the theory of change of “Come to the bay over the summer!”?

I. It might be worth reflecting upon how large part of this seem tied to something like "climbing the EA social ladder".

E.g. just from the first part, emphasis mine

Coming to Berkeley and, e.g., running into someone impressive  at an office space already establishes a certain level of trust since they know you aren’t some random person (you’ve come through all the filters from being a random EA to being at the office space).
If you’re in Berkeley for a while you can also build up more signals that you are worth people’s time. E.g., be involved in EA projects, hang around cool EAs.

Replace "EA" by some other environment with prestige gradients, and you have something like a highly generic social climbing guide. Seek cool kids, hang around them, go to exclusive parties, get good at signalling.

II. This isn't to say this is bad . Climbing the ladder to some extent could be instrumentally useful, or even necessary, for an ability to do some interesting things, sometimes.

III. But note the hidden costs. Climbing the social ladder can trade of against building things. Learning all the Berkeley vibes can trade of against, eg., learning the math actually useful for understanding agency. 

I don't think this has any clear bottom line - I do agree for many people caring about EA topics it's useful to come to the Bay  from time to time. Compared to the original post  I would probably mainly suggest to also consult virtue ethics and think about what sort of person you are changing yourself to, and if you, for example, most want to become "a highly cool and well networked EA" or e.g. "do  things which need to be done", which are different goals.

Getting GPT-3 to predict Metaculus questions

Suggested variation, which I'd expect to lead to better results: use raw "completion probabilities" for different answers.

E.g. with prompt "Will Russia invade Ukrainian territory in 2022?" extract completion likelihoods of the next few tokes "Yes" and "No". Normalize

Case for emergency response teams

Also the direction of ALERT is generally more on "doing". Doing seems often very different from forecasting, often needs different people - part of the relevant skills is plausibly even anticorrelated.

Emergency response

Crisis response is a broader topic. I would probably suggest creating additional tag for Crises response (most of our recent sequence would fit there)

"Long-Termism" vs. "Existential Risk"

I don't have a strong preference. There a some aspects in which longerism can be better framing, at least sometimes.

I. In a "longetermist" framework, x-risk reduction is the most important thing to work on for many orders of magnitude of uncertainty about the probability of x-risk in the next e.g. 30 years. (due to the weight of the long term future). Even if AI related x-risk is only 10ˆ-3 in next 30 years, it is still an extremely important problem or the most important one. In a "short-termist" view with, say, a discount rate of 5%, it is not nearly so clear.

The short-termist urgency of x-risk ("you and everyone you know will die") depends on the x-risk probability being actually high, like of order 1 percent, or tens of percents . Arguments why this probability is actually so high are usually brittle pieces of mathematical philosophy (eg many specific individual claims by Eliezer Yudkowsky) or brittle use of proxies with lot of variables obviously missing from the reasoning (eg the report by Ajeya Cotra). Actual disagreements about probabilities are often in fact grounded in black-box intuitions  about esoteric mathematical concepts.  It is relatively easy to come with brittle pieces of philosophy arguing in the opposite direction: why this number is low. In fact my actual, action guiding estimate is not based on an argument conveyable by a few paragraphs, but more on something like "feeling you get after working on this over years". What I can offer other is something like "an argument from testimony", and I don't think it's that great. 

II. Longermism is a positive word, pointing toward the fact that future could be large and nice. X-risk is the opposite. 

Similar: AI safety  vs AI alignment. My guess is the "AI safety" framing is by default more controversial and gets more of a pushback (eg  "safety department" is usually not the most loved part of an organisation, with connotations like "safety people want to prevent us from doing what we want")


Off Road: Interviews with EA College Dropouts

Title EA Dropouts  seems a bit confusing, because it can be naturally interpreted as people who dropped out of EA

What we tried

I had little influence over the 1st wave, credit goes elsewhere. 

What happened in subsequent waves is  complicated.  One sentence version is Czechia changed minister of health 4 times, only some of them were reasonably oriented, and how much they were interested in external advice differed a lot in time. 

Note that the "death tolls per capita in the world" stats are  misleading, due to differences in reporting. Czechia had average or even slightly lower than average mortality compared to "Eastern Europe" reference class, but much better reporting. For more reliable data, see https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)02796-3/fulltext

Load More