Toby Tremlett🔹

Content Strategist @ CEA
8870 karmaJoined Working (0-5 years)Oxford, UK

Bio

Participation
2

Hello! I'm Toby. I'm a Content Strategist at CEA. I work with the Online Team to make sure the Forum is a great place to discuss doing the most good we can. You'll see me posting a lot, authoring the EA Newsletter and curating Forum Digests, making moderator comments and decisions, and more. 

Before working at CEA, I studied Philosophy at the University of Warwick, and worked for a couple of years on a range of writing and editing projects within the EA space. Recently I helped run the Amplify Creative Grants program, to encourage more impactful podcasting and YouTube projects. You can find a bit of my own creative output on my blog, and my podcast feed.

How others can help me

Reach out to me if you're worried about your first post, want to double check Forum norms, or are confused or curious about anything relating to the EA Forum.

How I can help others

Reach out to me if you're worried about your first post, want to double check Forum norms, or are confused or curious about anything relating to the EA Forum.

Sequences
4

Your most valuable posts of 2025
Best of: Career Conversations Week 2025
Best of: Existential Choices Week
Existential Choices: Reading List

Comments
708

Topic contributions
124

Thanks for your contributions to the discussion @Hannah McKay🔸 , @Jo_🔸 , @Lee Wall , and @Alistair Stewart!

I have to head off at 7, but you are welcome to keep commenting, as is anyone else who sees this comment.

If you had to allocate a marginal $500,000, would you put it towards animal-specific alignment work (like the ideas in this list) or general alignment work?

animal-specific alignment
general alignment

Yep fair, that's what I mean by "power concentration and corrigibility". AGI being constrained by some values makes it at least minimally democratic (values are shaped by everyone who makes up a language, especially for LLMs). 

Thanks! That's clarifying. 

I wonder though - would that kind of world, where humans are empowered but don't experience intense (and perhaps moderate) suffering - be one where humans cared about animal welfare? I can see the intuition going either way. Either:

a) Extrapolating beyond person-to-person morality is (often) a luxury pursuit and more of it will happen in a post-scarcity world.

b) Caring about animal suffering in the food system and in nature requires compassion, and compassion is rooted in being able to imagine the states of the sufferer. If humans all live minimal suffering lives, they won't be able to do so. 

Oh yes but I made the above comment more to represent the view that I've seen in some AI x Animals work that we should be working on aligning AGI to pro-animal values, through things like AnimalHarmBench etc..

I think in the long-run I'd be more confident that corrigible AI would lead to good futures than AI that is aligned to specific values (besides perhaps some side-constraints). This is mainly because I'm pretty clueless and think our current values are likely to be wrong, and I'd rather we had more time to improve them. 

I haven't thought enough about the relationship between power concentration and corrigibility though - I expect that could change my mind. 

Can you say a bit more about what "AGI goes well for humans" means under your worldview? I hadn't heard of painism. 

AGI, whether rogue or human-aligned, may not decide to keep other planets free of biological animals (though it seems like a bigger risk for human-aligned AGI)

This is a really interesting point that I hadn't thought of before. 

Very lightly held counterargument to your conclusion:

P1: The more capable an AGI system is, the harder it is to align.

P2: Terraforming other planets requires AGI at the very top of the capability distribution.

P3: The pool of systems capable of terraforming is therefore drawn disproportionately from the capability range where misalignment is most likely.

Conclusion: Most worlds containing planet-terraforming AGI are probably rogue-AGI worlds. So the "spreading wild animal suffering to new planets" scenario may be more associated with alignment failure than alignment success.

Corollary: If you agree you should be mildly agree-voting. 

For example, I think a crux might be the tractability of animal-specific alignment work. e.g. can we align AI to specific values or (just) make it corrigible to our preferences and commands? I don't know, but this would massively affect my estimation of the tractability here. 

Load more