241 karmaJoined


Arb is a research consultancy. Specialties: AI, forecasting, pandemics, metascience, jaundiced literature reviews. 


Hey! Sorry, it went to spam, I replied now.

No form, just email for now.


Strong upvote from us.

Two natural places to ask are Bountied Rationality and the EA twitter group.

Answer by Arb39

Arb is a new research consultancy led by Misha Yagudin and Gavin Leech.

In our first 6 months we've worked on forecasting, vaccine strategy, AI risk, economics, cause prioritisation, grantmaking, and large-scale data collection. We're also working with Emergent Ventures and Schmidt Futures on their AI talent programme.

Consulting is reactive, but we have lots of ideas of our own which you can help shape.

We're looking for researchers with some background in ML, forecasting, technical writing, blogging, or some other hard thing. We only take work we think is important. Get in touch!


Language models for detecting bad scholarship 

Epistemic institutions

Anyone who has done desk research carefully knows that many citations don't  support the claim they're cited for - usually in a subtle way, but sometimes a total nonsequitur. Here's a fun list of 13 features we need to protect ourselves.

This seems to be a side effect of academia scaling so much in recent decades - it's not that scientists are more dishonest than other groups, it's that they don't have time to carefully read everything in their sub-sub-field (... while maintaining their current arms-race publication tempo). 

Take some claim P which is below the threshold of obviousness that warrants a citation. 

It seems relatively easy, given current tech, to answer: (1) "Does the cited article say P?" This question is closely related to document summarisation - not a solved task, but the state of the art is workable. Having a reliable estimate of even this weak kind of citation quality would make reading research much easier - but under the above assumption of unread sources, it would also stop many bad citations from being written in the first place.

It is very hard to answer (2) "Is the cited article strong evidence for P?", mostly because of the lack of a ground-truth dataset.

We elaborate on this here

(Thanks to Jungwon Byun and Gwern Branwen for comments.)

Just emailed Good Judgment Inc about it.


On malevolence: How exactly does power corrupt?

Artificial Intelligence / Values and Reflective Processes

How does it happen, if it happens? Some plausible stories:

  • Backwards causation: People who are “corrupted” by power always had a lust for power but deluded others and maybe even themselves about their integrity;
  • Being a good ruler (of any sort) is hard and at times very unpleasant, even the nicest people will try to cover up their faults, covering up causes more problems... and at some point it is very hard to admit that you were incompetent ruler all along.
  • Power changes your incentives so much that it corrupts all but the strongest. The difference with the last one is that value drift is almost immediate upon getting power.
  • A mix of the last two would be: you get more and more adverse incentives with every rise in power.
  • It might also be the case that most idealist people come into power under very stressful circumstances, which forces them to make decisions favouring consolidation of power (kinda instrumental convergence).
  • See also this on the personalities of US presidents and their darknesses.

Evaluating large foundations

Effective Altruism

Givewell looks at actors: object-level charities, people who do stuff. But logically, it's even more worth scrutinising megadonors (assuming that they care about impact or public opinion about their operations, and thus that our analysis could actually have some effect on them).

For instance, we've seen claims that the Global Fund, who spend $4B per year, meet a 2x GiveDirectly bar but not a Givewell Top Charity bar.

This matters because most charity - and even most good charity - is still not by EAs or run on EA lines. Also, even big cautious foundations can risk waste / harm, as arguably happened with the Gates Foundation and IHME - it's important to understand the base rate of conservative giving failing, so that we can compare hits-based giving. And you only have to persuade a couple of people in a foundation before you're redirecting massive amounts.


More Insight Timelines

In 2018, the Median Group produced  an impressive timeline of all of the insights required for current AI, stretching back to China's Han Dynasty(!)

The obvious extension is to alignment insights. Along with some judgment calls about relative importance, this would help any effort to estimate / forecast progress, and things like the importance of academia and non-EAs to AI alignment. (See our past work for an example of something in dire need of an exhaustive weighted insight list.)

Another set in need of collection are more general breakthroughs - engineering records broken, paradigms invented, art styles inaugurated - to help us answer giant vague questions about e.g. slowdowns in physics, perverse equilibria in academic production, and "Are Ideas Getting Harder to Find?"


Our World in Base Rates

Epistemic Institutions

Our World In Data are excellent; they provide world-class data and analysis on a bunch of subjects. Their COVID coverage made it obvious that this is a very great public good. 

So far, they haven't included data on base rates; but from Tetlock we know that base rates are the king of judgmental forecasting (EAs generally agree). Making them easily available can thus help people think better about the future. Here's a cool corporate example. 


85% of big data projects fail”; 
10% of people refuse to be vaccinated because of fearing needles (pre-COVID so you can compare to the COVID hesitancy)”; 
"11% of ballot initiatives pass"
“7% of Emergent Ventures applications are granted”; 
“50% of applicants get 80k advice”; 
“x% of applicants get to the 3rd round of OpenPhil hiring”, "which takes y months"; 
“x% of graduates from country [y] start a business”.


  • come up with hundreds of baserates relevant to EA causes
  • scrape Wikidata for them, or
  • recurse: get people to forecast the true value, or later value (put them in a private competition on Foretold,  index them on

Later, QURI-style innovations: add methods to combine multiple estimates and do proper Bayesian inference on them. If we go the crowdsourcing route, we could use the infrastructure used for graphclasses (voting on edits). Prominently mark the age of the estimate.


PS: We already sympathise with the many people who critique base rates for personal probability.

Load more