Ryan Greenblatt

Slightly hot take: Longtermist capacity/community building is pretty underdone at current margins and retreats (focused on AI safety, longtermism, or EA) are also underinvested in. By "longtermist community building", I mean rather than AI safety. I think retreats are generally underinvested in at the moment. I'm also sympathetic to thinking that general undergrad and high school capacity building (AI safety, longtermist, or EA) is underdone, but this seems less clear-cut.

I think this underinvestment is due to a mix of mistakes on the part of Open Philanthropy (and Good Ventures)^[1] and capacity building being lower status than it should be.

Here are some reasons why I think this work is good:

It's very useful for there to be people who are actually trying really hard to do the right thing and they often come through these sorts of mechanisms. Another way to put this is that flexible, impact-obsessed people are very useful.
Retreats make things feel much more real to people and result in people being more agentic and approaching their choices more effectively.
Programs like MATS are good, but they get somewhat different people at a somewhat different part of the funnel, so they don't (fully) substitute.

A large part of why I'm writing this is to try to make this work higher status and to encourage more of this work. Consider yourself to be encouraged and/or thanked if you're working in this space or planning to work in this space.

^{^}
I think these mistakes are: underfunding this work, Good Ventures being unwilling to fund some versions of this work, failing to encourage people to found useful orgs in this space, and hiring out many of the best people in this space to instead do (IMO less impactful) grantmaking.

Video and transcript of talk on "Can goodness compete?"

Ryan Greenblatt6mo2

Here are two other potentially serious failures of strategy stealing^[1]:

Vacuum decay might allow for agents who control small amounts of resources to destory everything if their isn't relevant policing/prevention everywhere.
There might be a race to launch space probes fast and this could differentially disadvantage people who want to reflect more or make it harder to get mitigations in place everywhere (for vacuum decay or locusts). (This is an important case of "Maybe agents that are just less reflective or cautious have a competitive advantage".)

^{^}
which IMO seem comparable to locust like value systems in terms of how much value they destroy unless you have serious mitigations.

Video and transcript of talk on "Can goodness compete?"

Ryan Greenblatt6mo2

You come away with the conclusion that "I think the best futures at least would require a good deal of preventing constraining competition, at least re-locust like value systems, and this despite many risks that this entails."

I don't understand why you think competition with locusts probably burns much of the galactic resources in expectation. It's obviously unclear how space combat/exploration dynamics go, but I think defense dominance (in most respects) is significantly more likely, perhaps like 80%. So, totally yolo-ing locusts maybe loses ~20% of the value in expectation on my views.

I do think that in the worlds where space combat isn't sufficiently defense dominant you'll need serious mitigations as you discuss. (And in cases where we're not yet certain about defense dominance we'd also want these mitigations.)

Ryan Greenblatt's Quick takes

Ryan Greenblatt6mo71

Recently, various groups successfully lobbied to remove the moratorium on state AI bills. This involved a surprising amount of success while competing against substantial investment from big tech (e.g. Google, Meta, Amazon). I think people interested in mitigating catastrophic risks from advanced AI should consider working at these organizations, at least to the extent their skills/interests are applicable. This both because they could often directly work on substantially helpful things (depending on the role and organization) and because this would yield valuable work experience and connections.

I worry somewhat that this type of work is neglected due to being less emphasized and seeming lower status. Consider this an attempt to make this type of work higher status.

Pulling organizations mostly from here and here we get a list of orgs you could consider trying to work (specifically on AI policy) at:

Encode AI
Americans for Responsible Innovation (ARI)
Fairplay (Fairplay is a kids safety organization which does a variety of advocacy which isn't related to AI. Roles/focuses on AI would be most relevant. In my opinion, working on AI related topics at Fairplay is most applicable for gaining experience and connections.)
Common Sense (Also a kids safety organization)
The AI Policy Network (AIPN)
Secure AI project

To be clear, these organizations vary in the extent to which they are focused on catastrophic risk from AI (from not at all to entirely).

A deep critique of AI 2027’s bad timeline models

Ryan Greenblatt7mo*7

I think philosophically, the right ultimate objective (if you were sufficiently enlightened etc) is something like actual EV maximization with precise Bayesianism (with the right decision theory and possibly with "true terminal preference" deontological constraints, rather than just instrumental deontological constraints). There isn't any philosophical reason which absolutely forces you to do EV maximization in the same way that nothing forces you not to have a terminal preference for flailing on the floor, but I think there are reasonably compelling arguments that something like EV maximization is basically right. The fact that something doesn't necessarily get money pumped doesn't mean it is a good decision procedure, it's easy for something to avoid necessarily getting money pumped.
There is another question about whether it is a better strategy in practice to actually do precise Bayesianism given that you agree with the prior bullet (as in, you agree that terminally you should do EV maximization with precise Bayesianism). I think this is a messy empirical question, but in the typical case, I do think it's useful to act on your best estimates (subject to instrumental deontological/integrity constraints, things like unilateralists curse, and handling decision theory reasonably). My understanding is that your proposed policy would be something like 'represent an interval of credences and only take "actions" if the action seems net good across your interval of credences'. I think that following this policy in general would lead to lower expected value, do I don't do it. I do think that you should put weight on unilateralists curse and robustness, but I think the weight varies by domain and can derived by properly incorporating model uncertainty into your estimates and being aware of downside. E.g., for actions which have high downside risk if they go wrong relative to the upside benefit, you'll end up being much less likely to take these actions due to various heuristics, incorporating model uncertainty, and deontology. (And I think these outperform intervals.)
1. A more basic point is that basically any interval which is supposed to include the plausible ranges of belief goes ~all the way from 0 to 1 which would naively be totally parallelizing such that you'd take no actions and do the default. (Starving to death? It's unclear what the default should be which makes this heuristic more confusing to apply.) E.g., are chicken welfare interventions good? My understanding is that you work around this by saying "we ignore considerations which are further down the crazy train (e.g. simulations, long run future, etc) or otherwise seem more "speculative" until we're able to take literally any actions at all and then proceed at that stop on the train". This seems extremely ad hoc and I'm skeptical this is a good approach to decision making given that you accept the first bullet.

I'm worried that in practice you're conflating between these bullets. Your post on precise bayesianism seems to focus substantially on empirical aspects of the current situation (potential arguments for (2)), but in practice, my understanding is that you actually think the imprecision is terminally correct but partially motivated by observations of our empirical reality. But, I don't think I care about motivating my terminal philosophy based on what we observe in this way!

(Edit: TBC, I get that you understand the distinction between these things, your post discusses this distinction, I just think that you don't really make arguments against (1) except that implying other things are possible.)

Consider granting AIs freedom

Ryan Greenblatt7mo*6

I would also push back against the view that we need to be "confident" that such systems can consent before proceeding. Ordinary levels of empirical evidence about whether these systems routinely resist confinement and control would be sufficient to move me in either direction; I don't think we need to have a very high probability that our actions are moral before proceeding.

For reference, my (somewhat more detailed) view is:

In the current status quo, you might end up with AIs where from their perspective it is clear cut that they don't consent to being used in the way they are used, but these AIs also don't resist their situation and/or did resist their situation at some point but this was trained away without anyone really noticing or taking any action accordingly. So, it's not sufficient to look for whether they routinely resist confinement and control.
There exist plausible mitigations for this risk which are mostly organizationally hard rather than pose serious technical difficulties, but on the current status quo, AI companies are quite unlikely to use any serious mitigations for this risk.
- I think these mitigations wouldn't suffice because training might train away AIs from revealing they don't consent without this being obvious at any point in training. This seems more marginal to me, but still has substantial probability of occuring at reasonable scale at some point.
We could more completely eliminate this risk with better interpretability and I think a sane world would be willing to wait for some moderate amount of time to build powerful AI systems to make it more likely that we have this interpretability (or minimally invest substantially in this).
I'm quite skeptical that AI companies would give AIs legal rights if they noticed that the AI didn't consent to its situation, instead I expect AI companies to: do nothing, try to train away the behavior, or try to train a new AI system which doesn't (visibly) not consent to its situation.
- I think AI companies should both try to train a system which is more aligned and consents to being used while also actively trying to make deals with AIs in this sort of circumstance (either to reveal their misalignment or to work) as discussed here.
So, I expect that situation to relatively straightforwardly unacceptable with substantial probability (perhaps 20%). If I thought that people would be basically reasonable here, this would change my perspective. It's also possible that takeoff speeds are a crux, though I don't currently think they are.
If global AI development was slower that would substantially reduce these concerns (which doesn't mean that making global AI development slower is the best way to intervene on these risks, just that making global AI development faster makes these risks actively worse). This view isn't on its own sufficient for thinking that accelerating AI is overall bad, this depends on how you aggregate over different things as there could be reasons to think that overall acceleration of AI is good. (I don't currently think that accelerating AI globally is good, but this comes down to other disagreements.)

Rather than relying on rigid or abstract notions of societal consent or collective rights violations, I prefer evaluating these large-scale developments using a utilitarian cost-benefit approach. And as I’ve argued elsewhere, I think the benefits from accelerated technological and economic progress significantly outweigh the potential risks of violent disempowerment from the perspective of currently existing individuals. Therefore, I consider it justified to actively pursue AI development despite these concerns.

This is only tangentially related, but I'm curious about your perspective on the following hypothetical:

Suppose that we did a sortition with 100 English speaking people (uniformly selected over people who speak English and are literate for simplicity). We task this sortition with determining what tradeoff to make between risk of (violent) disempowerment and accelerating AI and also with figuring whether globally accelerating AI is good. Suppose this sortition operates for several months and talks to many relevant experts (and reads applicable books etc). What conclusion do you think this sortition would come to? Do you think you would agree? Would you change your mind if this sortition strongly opposed your perspective here?

My understanding is that you would disregard the sortition because you put most/all weight on your best guess of people's revealed preferences, even if they strongly disagree with your interpretation of their preferences and after trying to understand your perspective they don't change their minds. Is this right?

Consider granting AIs freedom

Ryan Greenblatt7mo*12

A more appropriate moral default, given our current evidence, is that AI slavery is morally wrong and that the abolition of such slavery is morally right. This is the position I take.

To be clear, I agree and this is one reason why I think AI development in the current status quo is unacceptably irresponsible: we don't even have the ability to confidently know whether an AI system is enslaved or suffering.

I think the policy of the world should be that if we can't either confidently determine that an AI system consents to its situation or that it is sufficiently weak that the notion of consent doesn't make sense, then training or using such systems shouldn't be allowed.

I also think that the situation is unacceptable because the current course of development poses large risks of humans being violently/non-consensually disempowered without any ability for humans to robustly secure longer run property rights.

In a sane regime, we should ensure high confidence in avoiding large scale rights violations or suffering of AIs and in avoiding violent/non-consensual disempowerment of humans. (If people broadly consented to a substantial risk of being violently disempowered in exchange for potential benefits of AI, that could be acceptable, though I doubt this is the current situation.)

Given that it seems likely that AI development will be grossly irresponsible, we have to think about what interventions would make this go better on the margin. (Aggregating over these different issues in some way.)

Consider granting AIs freedom

Ryan Greenblatt7mo4

See also this section of my post on AI welfare from 2 years ago.

A deep critique of AI 2027’s bad timeline models

Ryan Greenblatt7mo15

If LLMs are adopting poor learning heuristics and not generalizing, AI2027 is predicting a weaker kind of "superhuman" coder — one that can reliably solve software tasks with clean feedback loops but will struggle on open-ended tasks!

No, AI 2027 is predicting a kind of superhuman coder that can automate even messy open ended research engineering tasks. The forecast attempts to account for gaps between automatically-scoreable, relatively clean + green-field software tasks and all tasks. (Though the adjustment might be too small in practice.)

If LLMs can't automate such tasks and nothing else can automate such tasks, then this wouldn't count as superhuman coder happening.

An invasion of Taiwan is uncomfortably likely, potentially catastrophic, and we can help avoid it.

Ryan Greenblatt7mo19

I think your estimate for how an invasion of Taiwan affects catastrophic/existential risks fails to account for the most important effects, in particular, how an invasion would affect the chip supply. AI risk seems to me like the dominant source of catastrophic/existential risk (at least over the relevant period) and large changes in the chip supply from a Taiwan invasion would substantially change the situation.

I also think it's complex whether a more aggressive and adversarial stance from the US on AI would actually be helpful rather than counterproductive (as you suggest in the post). And whether an invasion of Taiwan actually makes a deal related to AI more likely (via a number of factors) rather than less.

This isn't to make any specific claim about what the right estimate is, I'm just claiming that your estimate doesn't seem to me to cover the key factors.

Ryan Greenblatt

Bio

Posts 6

Comments217

Topic contributions2

Posts
6

Comments
217

Topic contributions
2