ex-Director @ Center for Effective Aid Policy

24

266

I strongly upvoted this post because I'm extremely interested in seeing it get more attention and, hopefully, a potential rebuttal. I think this is extremely important to get to the bottom of!

At first glance your critiques seem pretty damning, but I would have to put a bunch of time into understanding ACE's evaluations first before I would be able to conclude whether I agree your critiques (I can spend a weekend day doing this and writing up my own thoughts in a new post if there is interest).

My expectation is that if I were to do this I would come out feeling less confident than you seem to be. I'm a bit concerned that you haven't made an attempt at explaining why ACE might have constructed their analyses this way.

But like I'm pretty confused too. It's hard to think of much justification for the choice of numbers in the 'Impact Potential Score' and deciding the impact of a book based on the average of all books doesn't seem like the best way to approach things?

EDIT: Someone on lesswrong linked a great report by Epoch which tries to answer exactly this.

With the release of openAI o1, I want to ask a question I've been wondering about for a few months.

Like the chinchilla paper, which estimated the optimal ratio of data to compute, are there any similar estimates for the optimal ratio of compute to spend on inference vs training?

In the release they show this chart:

The chart somewhat gets at what I want to know, but doesn't answer it completely. How much additional inference compute would I need a 1e25 o1-like model to perform as well as a one shotted 1e26?

Additionally, for some x number of queries, what is the optimal ratio of compute to spend on training versus inference? How does that change for different values of x?

Are there any public attempts at estimating this stuff? If so, where can I read about it?

Laughed out loud for a good minute after reading this!