Founder @ Arb
5404 karmaJoined Jun 2015Working (6-15 years)Pursuing a doctoral degree (e.g. PhD)


Co-founder of Arb, an AI / forecasting / etc consultancy. Doing a technical AI PhD.

Conflicts of interest: ESPR, EPSRC, Emergent Ventures, OpenPhil, Infrastructure Fund, Alvea.


Sorted by New
· 2y ago · 1m read
· 1y ago · 2m read
· 1y ago · 2m read
· 1y ago · 6m read
· 1y ago · 8m read
· 1y ago · 3m read
· 1y ago · 6m read


Topic Contributions

It's not a separate approach, the non-theory agendas and even some of the theory agendas have their own answers to these questions. I can tell you that almost everyone besides CoEms and OAA are targeting NNs though.

Oh great, thanks. I would guess that these discrete cases form a minority of their work, but hopefully someone with actual knowledge can confirm.


The closing remarks about CH seem off to me. 

  1. Justice is incredibly hard; doing justice while also being part of a community, while trying to filter false accusations and thereby not let the community turn on itself, is one of the hardest tasks I can think of. 
    So I don't expect disbanding CH to improve justice, particularly since you yourself have shown the job to be exhausting and ambiguous at best. 
    You have, though, rightly received gratitude and praise - which they don't often, maybe just because we don't often praise people for doing their jobs. I hope the net effect of your work is to inspire people to speak up.
  2. The data on their performance is profoundly censored. You simply will not hear about all the times CH satisfied a complainant, judged risk correctly, detected a confabulator, or pre-empted a scandal through warnings or bans. What denominator are you using? What standard should we hold them to? You seem to have chosen "being above suspicion" and "catching all bullies".
  3. It makes sense for people who have been hurt to be distrustful of nearby authorities, and obviously a CH team which isn't trusted can't do its job. But just to generate some further common knowledge and meliorate a distrust cascade: I trust CH quite a lot. Every time I've reported something to them they've surprised me with the amount of skill they put in, hours per case. (EDIT: Clarified that I've seen them work actual cases.)

Thanks for all your work Ben. 

But a glum aphorism comes to mind: the frame control you can expose is not the true frame control.

What about factor increase per year, reported alongside a second number to show how the increases compose (e.g. the factor increase per decade)? So "compute has been increasing by 1.4x per year, or 28x per decade" or sth.

The main problem with OOMs is fractional OOMs, like your recent headline of "0.1 OOMs". Very few people are going to interpret this right, where they'd do much better with "2 OOMs".

Despite my best efforts (and an amazing director candidate, and a great list of volunteers), this project suffered from the FTX explosion and an understandable lack of buy-in for an org with maximally broad responsibilities, unpredictable time-to-payoff, and a largeish discretionary fund. As a result, we shuttered without spending any money. Two successor orgs, one using our funding and focussed on bio, are in the pipeline though.

I'll be in touch if either of the new orgs want to contact you as a volunteer.

Answer by GavinSep 07, 202310

Break self-improvement into four:

  1. ML optimizing ML inputs: reduced data centre energy cost, reduced cost of acquiring training data, supposedly improved semiconductor designs. 
  2. ML aiding ML researchers. e.g. >3% of new Google code is now auto-suggested without amendment.
  3. ML replacing parts of ML research. Nothing too splashy but steady progress: automatic data cleaning and feature engineering, autodiff (and symbolic differentiation!), meta-learning network components (activation functions, optimizers, ...), neural architecture search.
  4. Classic direct recursion. Self-play (AlphaGo) is the most striking example but it doesn't generalise, so far. Purported examples with unclear practical significance: Algorithm Distillation and models finetuned on their own output.[1]

See also this list



  1. ^

    The proliferation of crappy bootleg LLaMA finetunes using GPT as training data (and collapsing when out of distribution) makes me a bit cooler about these results in hindsight.

The only part of the Bing story which really jittered me is that time the patched version looked itself up through Bing Search, saw that the previous version Sydney was a psycho, and started acting up again. "Search is sufficient for hyperstition."

Load more