Ryan Greenblatt

103 karmaJoined Jul 2021


Do you see most of the expected value of the forum coming from value/year similar to the current value/year but continuing into the future (maintaining what you currently have)? Or is most of the value coming from the possibility of producing much more value per year (going big)?

If you imagine going big as most of expected value, then how do you anticipate going big?

if this grant made even a 0.001 percent contribution to speeding up that race, which seems plausible then the grant could st theill be strongly net negative.

Suppose the grant made the race 0.001% faster overall, but made OpenAI 5% more focused on alignment. That seems like an amazingly good trade to me.

This is quite sensitive to the exact quantitative details and I think the speed up is likely way, way more than 0.001%.

Even if we assume for the moment that the effect of the grant was net positive in increasing the safety of OpenAI itself, what if it accelerated their progress just a little and helped create this dangerous race we are in. When the head of Microsoft says "the race is on" basically referring to chatGPT, if this grant made even a 0.001 percent contribution to speeding up that race, which seems plausible then the grant could st theill be strongly net negative.

I don't have a problem with your positive opinion (although I strongly disagree), but think it is good to engage with the stronger counterpoints, rather than what I think is a bit of a strawman with the "implicit endorsement" negative.

Oh, I just think the effect of the 30 million dollars is way smaller than the total value of labor from EAs working at OpenAI such that the effect of the money is dominated by EAs being more likely to work there. I'm not confident in this, but the money seems pretty unimportant ex-post while the labor seems quite important.

I think the speed up in timelines from people with EA/longtermist motivations working at OpenAI is more like 6 months to 3 years (I tend to think this speed up is bigger than other people I talk to). The speed up from money seems relatively tiny.

Edit: It's worth noting that this grant is not counterfactually responsible for most of these people working at (or continuing to work at) OpenAI, but I do think that the human capital is likely a more important consideration than the literal financial capital here because of the total magnitude of human capital being bigger.

I think OpenPhil's grant to OpenAI is quite likely the best grant that OpenPhil has made in terms of counterfactual positive impact.

It's worth noting that OpenPhil's grant to OpenAI was done in order to acquire a board seat and generally to establish a relationship rather than being done because adding more money to OpenAI was a good use of funds at the margin.

See the grant write up here which discusses the motivation for the grant in detail.

Generally, I think influencing OpenAI was made notably easier via this grant (due to the board seat) and this influence seems quite good and has led to various good consequences (increased emphasis on AGI alignment for example).

The cost in dollars is quite cheap.

The main downside I can imagine is that this grant served as an implicit endorsement of OpenAI which resulted in a bunch of EAs working there which was then net-negative. My guess is that having these EAs work at OpenAI was probably good on net (due to a combination of acquiring influence and safety work - I don't currently think the capabilities work was good on its own).

Personally, I currently plan on never retiring prior to the singularity: I'd just keep working. Well after the singularity, I'm not confident, but it seems hard to plan.

Note: actually retiring is pretty cheap, so if commiting to that is useful for motivation, it should be a good idea even putting aside self interest. Personally, I find the idea of trying to do my best even when I don't have much to give motivating, at least aspirationally, so I'm planning on doing this.

If (prior to the singularity) I end up in a state where I'm unable to contribute value higher than my cost of living, I'll probably just do voluntary euthanasia? This is predicated on quite strong longtermist views where current lives aren't very important. But, maybe it makes sense to instead just move to a place with a very low cost of living and try to live very cheaply (e.g., on a few dollars a day or something) while being as happy as possible?

That said, my understanding is that in the US in the current policy regime, it would be good to stay alive to collect social security to donate (e.g., this value is higher than my cost of living particular when optimizing the cost of living down).

I'm not confident I'll remain this fanatically non-self-interested on further reflection, but this my median guess for what my future views would look like after thinking about it for the rest of my life.

It's worth noting that I think that GPT5 (with finetuning and scaffolding, etc.) is perhaps around 2% likely to be AGI. Of course, you'd need serious robotic infrastructure and much larger pool of GPUs to automate all labor.

My general view is 'if the compute is there, the AGI will come'. I'm going out on more of a limb with this exact plan and I'm much less confident in the plan than in this general principle.

Here are some examples reasons why I think my high probabilities are plausible:

  • The training proposal I gave is pretty close to how models like GPT4 are trained. These models are pretty general and are quite strategic etc. Adding more FLOP makes a pretty big qualitative difference.
  • It doesn't seem to me like you have to generalize very far for this to succeed. I think existing data trains you to do basically everything humans can do. (See GPT4 and prompting)
  • Even if this proposal is massively inefficient, we're throwing an absurd amount of FLOP at it.
  • It seems like the story for why humans are intelligent looks reasonably similar to this story: have big, highly functional brains, learn to predict what you see, train to achieve various goals, generalize far. Perhaps you think humans intelligence is very unlikely ex-ante (<0.04% likely).

This seems like a pretty good description of this prediction.

Your description misses needing a finishing step of doing some RL, prompting, and generally finetuning on the task of interest (similar to GPT4). But this isn't doing much of the work, so it's not a big deal. Additionally, this sort of finishing step wasn't really developed in 2013, so it seems less applicable to that version.

I'm also assuming some iteration on hyperparameters and data manipulation etc. in keeping with the techniques used in the respective time periods. So, 'first try' isn't doing that much work here because you'll be iterating a bit in the same way that people generally iterate a bit (but you won't be doing novel research).

My probabilities are for the 'first shot' but after you do some preliminary experiments to verify hyper-params etc. And with some iteration on the finetuning. There might be a non-trivial amount of work on the finetuning step also, I don't have a strong view here.

I'm not sure I buy '2013 algorithms are literally enough', but it does seem very likely to me that in practice you get AGI very quickly (<2 years) if you give out GPUs which have (say) 10^50 FLOPS. (These GPUS are physically impossible, but I'm just supposing this to make the hypothetical easier. In particular, 2013 algorithms don't parallelize very well and I'm just supposing this away.)

And, I think 2023 algorithms are literally enough with this amount of FLOP (perhaps with 90% probability).

For a concrete story of how this could happen, let's imagine training a model with around 10^50 FLOP to predict all human data ever produced (say represented as uncompressed bytes and doing next token prediction) and simultaneously training with RL to play every game ever. We'll use the largest model we can get with this flop budget, probably well over 10^25 parameters. Then, you RL on various tasks, prompt the AI, or finetune on some data (as needed).

This can be done with either 2013 or 2023 algorithms. I'm not sure if it's enough with 2013 algorithms (in particular, I'd be worried that the AI would be extremely smart but the elicitation technology wasn't there to get the AI to do anything useful). I'd put success with 2013 algos and this exact plan at 50%. It seems likely enough with 2023 algorithms (perhaps 80% chance of success).

In 2013 this would look like training an LSTM. Deep RL was barely developed, but did exist.

In 2023 this looks similar to GPT4 but scaled way up and trained on all source of data and trained to play games etc.

Load more