Former AI safety research engineer, now PhD student in philosophy of ML at Cambridge. I'm originally from New Zealand but have lived in the UK for 6 years, where I did my undergrad and masters degrees (in Computer Science, Philosophy, and Machine Learning). Blog:


EA Archives Reading List

Wiki Contributions


Suggested norms about financial aid for EAG(x)

In hindsight I should have elaborated on the "cooperativeness" part more; I've edited the post to do so. The key point is made in this post about how donating only to what seems like the most neglected priority to you is partially a form of free-riding, because it means that others who have different values need to spend their resources on things that you both care about. So in order to have healthier relationships with other altruists, you should agree to both partially cover shared priorities, even when that is a less effective use of money in the short term.

Now, you might have stronger or weaker intuitions about how important this type of cooperation is. I think my intuition is that we should aim for cooperative norms that are strong enough that we can cooperate even across large value differences. But cooperative norms which are this strong will then weigh heavily in favour of cooperation between altruists with much smaller value differences, like CEA and EAG attendees (especially because CEA and/or big EA funders have thought about this and decided that the benefits of having people pay for their own tickets by default are more important, from their perspective, than downsides like tax inefficiency).

It also seems reasonable to disagree with this; it's something of a judgement call. But I claim that this is the right judgement call to be making.

Forecasting transformative AI: what's the burden of proof?

Thanks for the response, that all makes sense. I missed some of the parts where you disambiguated those two concepts; apologies for that. I suspect I still see the disparity between "extraordinarily important century" and "most important century" as greater than you do, though, perhaps because I consider value lock-in this century less likely than you do - I haven't seen particularly persuasive arguments for it in general (as opposed to in specific scenarios, like AGIs with explicit utility functions or the scenario in your digital people post). And relatedly, I'm pretty uncertain about how far away technological completion is - I can imagine transitions to post-human futures in this century which still leave a huge amount of room for progress in subsequent centuries.

I agree that 'extraordinarily important century" and "transformative century" don't have the same emotional impact as "most important century".  I wonder if you could help address this by clarifying that you're talking about "more change this century than since X" (for x = a millennium ago, or since agriculture, or since cavemen, or since we diverged from chimpanzees). "Change" also seems like a slightly more intuitive unit than "importance", especially for non-EAs for whom "importance" is less strongly associated with "our ability to exert influence".

Forecasting transformative AI: what's the burden of proof?

I very much like how careful you are in looking at this question of the burden of proof when discussing transformative AI. One thing I'm uncertain about, though: is the "most important century" framing the best one to use when discussing this? It seems to me like "transformative AI is coming this century" and "this century is the most important century" are very different claims which you tend to conflate in this sequence.

One way of thinking about this: suppose that, this century, there's an AI revolution at least as big as the industrial revolution. How many more similarly-sized revolutions are plausible before reaching a stable galactic civilisation? The answer to this question could change our estimate of P(this is the most important century) by an order of magnitude (or perhaps two, if we have good reasons to think that future revolutions will be more important than this century's TAI), but has a relatively small effect on what actions we should take now.

More generally, I think that claims which depend on the specifics of our long-term trajectory after transformative AI are much easier to dismiss as being speculative (especially given how much pushback claims about reaching TAI already receive for being speculative). So I'd much rather people focus on the claim that "AI will be really, really big" than "AI will be bigger than anything else which comes afterwards". But it seems like framing this sequence of posts as the "most important century" sequence pushes towards the latter.

Oh, also, depending on how you define "important", it may be the case that past centuries were more important because they contained the best opportunities to influence TAI - e.g. when the west became dominant, or during WW1 and WW2, or the cold war. Again, that's not very action-guiding, but it does make the "most important century" claim even more speculative.


AGI safety from first principles

Ah, I like the multiagent example. So to summarise: I agree that we have some intuitive notion of what cognitive processes we think of as intelligent, and it would be useful to have a definition of intelligence phrased in terms of those. I also agree that Legg's behavioural definition might diverge from our implicit cognitive definition in non-trivial ways.

I guess the reason why I've been pushing back on your point is that I think that possible divergences between the two aren't the main thing going on here. Even if it turned out that the behavioural definition and the cognitive definition ranked all possible agents the same, I think the latter would be much more insightful and much more valuable for helping us think about AGI.

But this is probably not an important disagreement.

AGI safety from first principles

Ah, I see. I thought you meant "situations" as in "individual environments", but it seems like you meant "situations" as in "possible ways that all environments could be".

In that case, I think you're right, but I don't consider it a problem. Why might it be the case that adding more compute, or more memory, or something like that, would be net negative across all environments? It seems like either we'd have to define the set of environments in a very gerrymandered way, or else there's something about the change we made that lands us in a valley of bad thinking. In the former case, we should use a wider set of environments; in the latter case, it seems easier to bite the bullet and say "Yeah, turns out that adding more of this usually-valuable trait makes agents less intelligent."

AGI safety from first principles

One thing I'm confused about is whether Legg's definition (or your rephrasing) allows for situations where it's in principle possible that being smarter is ex ante worse for an agent (obviously ex post it's possible to follow the correct decision procedure and be unlucky).

There definitely are such cases - e.g. Omega penalises all smart agents. Or environments where there are several crucial considerations which you're able to identify at different levels of intelligence, so that as intelligence increases, your success increases and decreases.

But in general I agree with your complaint about Legg's definition being defined in behavioural terms, and how it'd be better to have a good definition of intelligence in terms of the cognitive processes involved (e.g. planning, abstraction, etc). I do think that starting off in behaviourist terms was a good move, back when people were much more allergic to talking about AGI/superintelligence. But now that we're past that point, I think we can do better. (I don't think I've written about this yet in much detail, but it's quite high on my list of priorities.)

AGI safety from first principles

I intended mine to be a slight rephrasing of Legg and Hutter's definition to make it more accessible to people without RL backgrounds. One thing that's not obvious from the way they use "environments" is that the goal is actually built into the environment via a reward function, so describing each environment as a "task" seems accurate.

A second non-obvious thing is that the body the agent uses is also defined as part of the environment, so that the agent only performs the abstract task of sending instructions to that body. A naive reading of Legg and Hutter's definition would interpret a stronger agent as being more intelligent. Adding "cognitive" I think rules this out, while also remaining true to the spirit of the original definition.

Curious if you still disagree, and if so why - I don't really see what you're pointing at with the Raven's Matrices example.

On the limits of idealized values

Fantastic post. A few scattered thoughts inspired by it:

If you aren’t trying to conform to some standard, than how can you truly, and non-arbitrarily, choose?

Why does our choice need to be non-arbitrary? If we take certain intuitions/desires/instincts as primitives, they may be fundamentally arbitrary, but that's because we  are unavoidably arbitrary. Yet this arbitrary initial state is all we have to work from.

What’s needed, here, is a type of choice that is creating, rather than trying to conform — and which hence, in a sense, is “infallible.”

It feels like infallible is the wrong type of description here, for the same reason that it would be odd to say that my taste in food is infallible. At a certain level the predicate "correct" will stop making sense. (Maybe that level isn't the level of choices, though; maybe it's instincts, or desires, or intuitions, or tastes - things that we don't see ourselves as having control over.)

richard_ngo's Shortform

There's an old EA forum post called Effective Altruism is a question (not an ideology) by Helen Toner, which I think has been pretty influential.*

But I was recently thinking about how the post rings false for me personally. I know that many people in EA are strongly motivated by the idea of doing the most good. But I was personally first attracted to an underlying worldview composed of stories about humanity's origins, the rapid progress we've made, the potential for the world to be much better, and the power of individuals to contribute to that; from there, given potentially astronomical stakes, altruism is a natural corollary.

I think that leaders in EA organisations are more likely to belong to the former category, of people inspired by EA as a question. But as I discussed in this post, there can be a tradeoff between interest in EA itself versus interest in the things EA deems important. Personally I prioritise making others care about the worldview more than making them care about the question: caring about the question pushes you to do the right thing in the abstract, but caring about the worldview seems better at pushing you towards its most productive frontiers. This seems analogous to how the best scientists are more obsessed with the thing they're studying than the downstream effects of their research.

Anyway, take all this with a grain of salt; it's not a particularly firm opinion, just one personal perspective. But one longstanding EA I was talking to recently found it surprising, so I thought it'd be worth sharing in case others do too. 

* As one datapoint: since the EA forum has been getting more users over time, a given karma score is more impressive the older a post is. Helen's post is twice as old as any other post with comparable or higher karma, making it a strong outlier.

Why should we *not* put effort into AI safety research?

Drexler's CAIS framework attacks several of the premises underlying standard AI risk arguments (although iirc he also argues that CAIS-specific safety work would be valuable). Since his original report is rather long, here are two summaries.

Load More