Agustín Covarrubias

AI Safety Group Support Lead @ Centre for Effective Altruism
1244 karmaJoined Aug 2022Pursuing an undergraduate degreeWorking (0-5 years)Santiago, Santiago Metropolitan Region, Chile



I’m a generalist and open sourcerer that does a bit of everything, but perhaps nothing particularly well. I'm currently the AI Safety Group Support Lead at CEA.

I was previously a Software Engineer in the Worldview Investigations Team at Rethink Priorities.


Sorted by New


This is just a top-notch post. I love to see detailed analyses like this. Props.

At last, a biblically accurate Qualy The Lightbulb

[Opinion exclusively my own]

I think this framing has a lot of value. Funnily enough, I've heard tales of groups like this from the early days of EA groups, when people were just figuring things out, and this pattern would sometimes pop up.

I do want to push back a little bit on this:

But before deferring, I think it's important to realize that you're deferring, and to make sure that you understand and trust who or what you're deferring to (and perhaps to first have an independent impression). Many intro fellowship curricula (eg the EA handbook) come across more as manifestos than research agendas—and are often presented as an ideology, not as evidence that we can use to make progress on our research question.

The EA handbook (which nowadays is what the vast majority of groups use for their intro fellowships) includes three “principles-first” weeks (weeks 1, 2, and 7), which are meant to help participants develop their own opinions with the help of only the basic EA tools or concepts.

Furthermore, week 7 (“What do you think”) includes a reading of independent impressions, and learning that concept (and discussing where EA might be wrong) is one of the key objectives of the week:

A key concept for this session is the importance of forming independent impressions. In the long run, you’re likely to gain a deeper understanding of important issues if you think through the arguments for yourself. But (since you can’t reason through everything) it can still sometimes make sense to defer to others when you’re making decisions.

In the past, a lot of work has been put in trying to calibrate how “principles-based” or “cause-oriented” intro fellowships should be, and I think the tradeoff can be steep for university groups since people can get rapidly disenchanted by abstract philosophical discussion about cause prioritization (as you mention). This can also lead to people treating EA as a purely intellectual exercise, instead of thinking of concrete ways in which EA ideas should (for example) change their career plans.

That said, I think there are other ways in which we could push groups further along in this direction, for example:

  • We could create programming (like fellowships or workshops) around cause prioritization, exploring different frameworks and tools in the field. Not just giving a taste of these frameworks (like the handbook does), but also teaching hands-on skills that participants can use for actual cause prioritization research.
  • We could push for more discussion centered around exploratory cause research, for example, by creating socials or events in which participants try to come up with cause candidates and do some preliminary research on how promising they are (i.e. a “cause exploration hackathon”).

I know there has been some previous work in this direction. For example, there's this workshop template, or this fellowship on global priorities research. But I think we don't yet have a killer demo, and I would be excited about groups experimenting with something like this.

X-Risk sentiment in the audience: at one point in the debate, one participant asked the audience who thought  AI was an existential risk. From memory, around 2/3s of students put up their hands.

Do you have a rough sense of how many of these had interacted with AI Safety programming/content from your group? Like, was a substantial part of the audience just members from your group who had heard EA arguments about AIS?

I love this post because over EAG last weekend I talked with a couple other people about songs with EA themes, and we thought about making a forum post with a list.

I like many of the songs by Vienna Teng, particularly Landsailor, which is “An ode to shipping logistics, city lights, globalized agriculture, and our interconnected world.”

As a bonus, there's also the The Precipice EDM remix (thanks @michel for flagging this one the other day lol).

Even beyond Head On, I think the most obviously EA song in the album is Visions:


Imagining the worlds that could be
Shaping a mosaic of fates
For all sentient beings

Cycles of growth and decay
Cascading chains of events
With no one to praise or blame

Avoidable suffering and pain
We are patiently inching our way
Toward unreachable utopias

Enslaved by the forces of nature
Elevated by mindless replicators
Challenged to steer our collective destiny

Ironically, I think I may have listened to this song dozens or hundreds of time before someone pointed out that José González was EA-adjacent, had sung at an EAG and had written this song to explicitly include EA themes.

The above makes me think that you should therefore be even more skeptical of OAA's chances of success than you are about Gaia's chances.

I am, but OAA also seems less specific, and it's harder to evaluate its feasibility compared to something more concrete (like this proposal). 

In fact, we think that if there are sufficiently many AI agents and decision intelligence systems that are model-based, i.e., use some kinds of executable state-space ("world") models to do simulations, hypothesise counterfactually about different courses of actions and external conditions (sometimes in collaboration with other agents, i.e., planning together), and deploy regularisation techniques (from Monte Carlo aggregation of simulation results to amortized adversarial methods suggested by Bengio on slide 47 here) to permit compositional reasoning about risk and uncertaintly that scales beyond the boundary of a single agent, the benefits of collaborative inference of the most accurate and well-regularised models will be so huge that something like Gaia Network will emerge pretty much "by default" because a lot of scientists and industry players will work in parallel to build some versions and local patches of it.

My problem with this is that it sounds good, but this argument relies on many hidden premises, that make me inherently skeptical of any strong claims like “(…) the benefits of collaborative inference of the most accurate and well-regularised models will be so huge that something like Gaia Network will emerge pretty much 'by default'”.

I think this could be addressed by a convincing MVP, and I think that you're working on that, so I won't push further on this point.

It's fine with me and most other people except for e/accs, for now, but what about the time when the cost of training powerful/dangerous models will drop so much that anyone can buy a chip to train the next rogue AI for 1000$? How does compute governance look in this world?

The current best proposals for compute governance rely on very specific types of math. I don't think throwing blockchain or DAOs at the problem makes a lot of sense, unless you find an instance of the very specific set of problems they're good at solving.

My priors against the crypto world comes mostly from noticing a lot of people throwing tools to problems without a clear story of how these tools actually solve the problem. This has happened so many times that I have come to generally distrust crypto/blockchain proposals unless they give me a clear explanation of why using these technologies makes sense.

But I think the point I made here was kinda weak anyway (it was, at best, discrediting by association), so I don't think it makes sense to litigate this particular point.

Compare with Collective Intelligence Project. It has started with the mission to "fix governance" (and pretty much "help to counteract Moloch" in the domain of political economy, too, they barely didn't use this concept, or maybe they even did, I don't want to check it now), and now they "pivoted" to AI safety and achieved great legibility on this path: e.g., they partner with OpenAI, apparently, on more than one project now. Does this mean that CIP is a "solution looking for a problem"? No, it's just the kind of project that naturally lends to helps both with Moloch and AI safety. I'd say the same could be said of Gaia Network (if it is realised in some forms) and this lies pretty much in plain sight.

I find this decently convincing, actually. Like, maybe, I'm pattern matching too much on other projects which have in the past done something similar (just lightly rebranding themselves while tacking a completely different problem).

Overall, I still don't feel very good about the overall feasibility of this project, but I think you were right to push back on some of my counterarguments here.

I think this would be more the result of new orgs rather than bigger orgs? Like I would argue that we currently don't have anything near the optimal amount of orgs dedicated to training programs, and as funding increases, we will probably get a lot of them.

Load more