Effective Altruism Forum
EA Forum

All of Rohin Shah's Comments + Replies

Will we get automated alignment research before an AI Takeoff?

Or is this a stronger claim that safety work is inherently a more short-time horizon thing?

It is more like this stronger claim.

I might not use "inherently" here. A core safety question is whether an AI system is behaving well because it is aligned, or because it is pursuing convergent instrumental subgoals until it can takeover. The "natural" test is to run the AI until it has enough power to easily take over, at which point you observe whether it takes over, which is extremely long-horizon. But obviously this was never an option for safety anyway, and many of the proxies that we think about are more short horizon.

Will we get automated alignment research before an AI Takeoff?

Rohin Shah17d12

Oh sorry, I missed the weights on the factors, and thought you were taking an unweighted average.

Is it because you have to run large evals or do pre-training runs? Do you think this argument applies to all areas of capabilities research?

All tasks in capabilities are ultimately trying to optimize the capability-cost frontier, which usually benefits from measuring capability.

If you have an AI that will do well at most tasks you give it that take (say) a week, then you have the problem that the naive way of evaluating the AI (run it on some difficult tasks an... (read more)

Sharmake

13d

For what it's worth, I think pre-training alone is probably enough to get us to about 1-3 month time horizons based on a 7 month doubling time, but pre-training data will start to run out in the early 2030s, meaning that you no longer (in the absence of other benchmarks) have very good general proxies of capabilities improvements. The real issue isn't the difference between hours and months long tasks, but the difference between months long tasks and century long tasks, which Steve Newman describes well here.

Ben_West🔸

14d

Is this just a statement that there is more low-hanging fruit in safety research? I.e., you can in some sense learn an equal amount from a two-minute rollout for both capabilities and safety, but capabilities researchers have already learned most of what was possible and safety researchers haven't exhausted everything yet. Or is this a stronger claim that safety work is inherently a more short-time horizon thing?

Will we get automated alignment research before an AI Takeoff?

Rohin Shah18d*14

Great analysis of factors impacting automatability.

Looking at your numbers though, I feel like you didn't really need this; you could have just said "I think scheming risk is by far the most important factor in automatability of research areas, therefore capabilities will come first". EDIT Overstated, I missed the fact that scheming risk factor had lower weight than the others.

I don't agree with that conclusion for two main reasons:

There's clearly some significant probability that the AIs aren't scheming; I would guess that even MIRI folks don't particular

... (read more)

Jan Wehner🔸

17d

Thanks for the input! On Scheming: I actually don't think scheming risk is the most important factor. Even removing it completely doesn't change my final conclusion. I agree that a bimodal distribution with scheming/non-scheming would be appropriate for a more sophisticated model. I just ended up lowering the weight I assign to the scheming factor (by half) to take into account that I am not sure whether scheming will/won't be an issue. In my analysis, the ability to get good feedback signals/success criteria is the factor that moves me the most to thinking that capabilities get sped up before safety. On Task length: You have more visibility into this, so I'm happy to defer. But I'd love to hear more about why you think tasks in capabilities research have longer task lengths. Is it because you have to run large evals or do pre-training runs? Do you think this argument applies to all areas of capabilities research?

Taking ethics seriously, and enjoying the process

Rohin Shah4mo11

I'm totally on board with "if the broader world thought more like EAs that would be good", which seems like the thrust of your comment. My claim was limited to the directional advice I would give EAs.

kuhanj

4mo

Yea, fair point. Maybe this is just reference class tennis, but my impression is that a majority of people who consider themselves EAs aren't significantly prioritizing impact in their career and donation decisions, but I agree that for the subset of EAs who do, that "heroic responsibility"/going overboard can be fraught. Some things that come to mind include how often EAs seem to work long hours/on weekends; how willing EAs are to do higher impact work when salaries are lower, when it's less intellectually stimulating, more stressful, etc; how many EAs are willing to donate a large portion of their income; how many EAs think about prioritization and population ethics very rigorously; etc. I'm very appreciative of how much more I see these in EA world than outside it, and I realize the above are unreasonable to expect from people.

Taking ethics seriously, and enjoying the process

Rohin Shah4mo50

I don’t know how much the FTX collapse is responsible for our current culture. They did cause unbelievable damage, acting extremely unethically and unilaterally and recklessly in destructive ways. But they did have this world-scale ambition, and urgency, and proclivity to actually make things happen in the world, that I think central EA orgs and the broader EA community sorely lack in light of the problems we’re hoping to solve.

But this is exactly why I don't want to encourage heroic responsibility (despite the fact that I often take on that mindset ... (read more)

Benevolent_Rain

4mo

Perhaps mentioned elsewhere here, but if we look for precedent for people doing an enormous amount of good (I can only think of Stanislav Petrov and people making big steps in curing disease), these actually did not act recklessly I think. It seems more like they persistently applied themselves to a problem, not super forcing an outcome and aligning a lot with others (like those eradicating smallpox). So if one wants a hero mindset, it might be good to emulate actual heroes we both think did a lot of good and that also reduced the risk of their actions.

kuhanj4mo17

While I really like the HPMOR quote, I don't really resonate with heroic responsibility, and don't resonate with the "Everything is my fault" framing. Responsibility is a helpful social coordination tool, but it doesn't feel very "real" to me. I try to take the most helpful/impactful actions, even if they don't seem like "my responsibility" (while being cooperative and not unilateral and with reasonable constraints).

I'm sympathetic to taking on heroic responsibility causing harm in certain cases, but I don't see strong enough evidence that it causes ... (read more)

AI Safety’s Talent Pipeline is Over-optimised for Researchers

Rohin Shah5mo27

In fact, all of the top 7 most sought-after skills were related to management or communications.

"Leadership / strategy" and "government and policy expertise" are emphatically not management or communications. There's quite a lot of effort on building a talent pipeline for "government and policy expertise". There isn't one for "leadership / strategy" but I think that's mostly because no one knows how to do it well (broadly speaking, not just limited to EA).

If you want to view things through the lens of status (imo often a mistake), I think "leadership / str... (read more)

Chris Leong

5mo

Yeah, I also found this sentence somewhat surprising. I likely care more about value alignment more than you, but I expect that the main way for people to signal this is by participating in multiple activities over time rather than by engaging in any particular program. I do agree with the OP's larger point though: that it is easier for researchers to demonstrate value alignment given that there are more programs to participate in. I also acknowledge that there are circumstances where it might be valuable to be able to signal value alignment with relatively few words.

Chris Clay🔸5mo20

Thanks for this! You've changed my mind

Should we aim for flourishing over mere survival? The Better Futures series.

Rohin Shah6mo*2

I think that most of classic EA vs the rest of the world is a difference in preferences / values, rather than a difference in beliefs.

I somewhat disagree but I agree this is plausible. (That was more of a side point, maybe I shouldn't have included it.)

most people really really don't want to die in the next ten years

Is your claim that they really really don't want to die in the next ten years, but they are fine dying in the next hundred years? (Else I don't see how you're dismissing the anti-aging vs sports team example.)

So, for x-risk to be high, many peo

... (read more)

JackM

6mo

Dying when you're young seems much worse than dying when you're old for various reasons: * Quality of life is worse when you're old * When you're old you will have done much more of what you wanted in life (e.g. have kids and grandkids) * It's very normal/expected to die when old Also, I'd imagine people don't want to fund anti-aging research for various (valid) reasons: * Skepticism it is very cost-effective * Public goods problem means under provision (everyone can benefit from the research even if you don't fund it yourself) * From a governmental perspective living longer is actually a massive societal issue as it introduces serious fiscal challenges as you need to fund pensions etc. From an individual perspective living longer just means having to work longer to support yourself for longer. So does anyone see anti-aging as that great? * People discount the future Having said all this, I actually agree with you that x-risk could be fairly high due to a failure of rationality. Primarily because we've never gone extinct so people naturally think it's really unlikely, but x-risk is rising as we get more technologically powerful. BUT, I agree with Will's core point that working towards the best possible future is almost certainly more neglected than reducing x-risk, partly because it's just so wacky. People think about good futures where we are very wealthy and have lots of time to do fun stuff, but do they think about futures where we create loads of digital minds that live maximally-flourishing lives? I doubt it.

Should we aim for flourishing over mere survival? The Better Futures series.

Rohin Shah6mo10

Most people really don’t want to die, or to be disempowered in their lifetimes. So, for existential risk to be high, there has to be some truly major failure of rationality going on.

... What is surprising about the world having a major failure of rationality? That's the default state of affairs for anything requiring a modicum of foresight. A fairly core premise of early EA was that there is a truly major failure of rationality going on in the project of trying to improve the world.

Are you surprised that ordinary people spend more money and time on, ... (read more)

William_MacAskill6mo13

I think that most of classic EA vs the rest of the world is a difference in preferences / values, rather than a difference in beliefs. Ditto for someone funding their local sports teams rather than anti-aging research. We're saying that people are failing in the project of rationally trying to improve the world by as much as possible - but few people really care much or at all about succeeding at that project. (If they cared more, GiveWell would be moving a lot more money than it is.)

In contrast, most people really really don't want to die in the nex... (read more)

Do short AI timelines make other cause areas useless?

Rohin Shah6mo6

If you think the following claim is true - 'non-AI projects are never undercut but always outweighed'

Of course I don't think this. AI definitely undercuts some non-AI projects. But "non-AI projects are almost always outweighed in importance" seems very plausible to me, and I don't see why anything in the piece is a strong reason to disbelieve that claim, since this piece is only responding to the undercutting argument. And if that claim is true, then the undercutting point doesn't matter.

Do short AI timelines make other cause areas useless?

Rohin Shah7mo38

We are disputing a general heuristic that privileges the AI cause area and writes off all the others.

I think the most important argument towards this conclusion is "AI is a big deal, so we should prioritize work that makes it go better". But it seems you have placed this argument out of scope:

[The claim we are interested in is] that the coming AI revolution undercuts the justification for doing work in other cause areas, rendering work in those areas useless, or nearly so (for now, and perhaps forever).
[...]
AI causes might be more cost-effective than

... (read more)

Hayley Clatterbuck

6mo

We wanted to focus on a specific and somewhat manageable question related to AI vs. non-AI cause prioritization. You're right that it's not the only important question to ask. If you think the following claim is true - 'non-AI projects are never undercut but always outweighed' - then it doesn't seem like an important question at all. I doubt that claim holds generally, for reasons that were presented in the piece. When deciding what to prioritize, there are also broader strategic questions that matter - how is money and effort being allocated by other parties, what is your comparative advantage, etc. - that we don't touch at all here.

calebp's Quick takes

Rohin Shah9mo14

I agree with some of the points on point 1, though other than FTX, I don't think the downside risk of any of those examples is very large

Fwiw I find it pretty plausible that lots of political action and movement building for the sake of movement building has indeed had a large negative impact, such that I feel uncertain about whether I should shut it all down if I had the option to do so (if I set aside concerns like unilateralism). I also feel similarly about particular examples of AI safety research but definitely not for the field as a whole.

Agree that

... (read more)

calebp's Quick takes

Rohin Shah9mo77

I'm not especially pro-criticism but this seems way overstated.

Almost all EA projects have low downside risk in absolute terms

I might agree with this on a technicality, in that depending on your bar or standard, I could imagine agreeing that almost all EA projects (at least for more speculative causes) have negligible impact in absolute terms.

But presumably you mean that almost all EA projects are such that their plausible good outcomes are way bigger in magnitude than their plausible bad outcomes, or something like that. This seems false, e.g.

FTX
Any kind

... (read more)

calebp

9mo

on I agree with some of the points on point 1, though other than FTX, I don't think the downside risk of any of those examples is very large. I'd walk back my claim to the downside risk to most EA projects seems low (but there are ofc exceptions). on Agree that criticisms of AI companies can be good, I don't really consider them EA projects but it wasn't clear that was what I was referring to in my post - my bad. Responding quickly to some of the other ones. * Concerns with Intentional Insights * This seems good, though it was a long time ago. * It's hard to tell, but I'd guess Critiques of Prominent AI Safety Labs changed who applied to the critiqued organizations * Idk if these are "EA" projects. I think I'm much more pessimistic than you are that these posts made better things happen in the world. I'd guess that people overupdated on these somewhat. That said, I quite like these posts and the discussion in the commentts. * Gossip-based criticism of Leverage clearly mattered and imo it would have been better if it was more public * This also seems good, though it was a long time ago and I wasn't around when leverage was a thing. * Sharing Information About Nonlinear clearly mattered in the sense of having some impact, though the sign is unclear * Sign seems pretty negative to me. * Same deal for Why did CEA buy Wytham Abbey? * Sign seems pretty negative to me. Like even the title is misleading and this generated a lot of drama. * Back in the era when EA discussions happened mainly on Facebook there were all sorts of critiques and flame wars between protest-tactics and incremental-change-tactics for animal advocacy, I don't think this particularly changed what any given organization tried to do, but it surely changed views of individual people * Not familiar but maybe this is useful? Idk. * Open Phil and RP both had pieces that were pretty critical of clean meat work iirc that were large updates for me. I don't think they were

Habryka [Deactivated]'s Quick takes

Rohin Shah1y18

Of course, it's true that they could ignore serious criticism is they wanted to, but my sense is that people actually quite often feel unable to ignore criticism.

As someone sympathetic to many of Habryka's positions, while also disagreeing with many of Habryka's positions, my immediate reaction to this was "well that seems like a bad thing", c.f.

shallow criticism often gets valorized

I'd feel differently if you had said "people feel obliged to take criticism seriously if it points at a real problem" or something like that, but I agree with you that the mech... (read more)

Sarah Cheng 🔸

I appreciate you sharing your views on this! I agree that as a whole, this is suboptimal. I don't currently feel confident enough about the take that "shallow criticism often gets valorized" to prioritize tackling it, though I am spending some time thinking about moderation and managing user-generated content and I expect that the mod team (including myself) will discuss how we'd like to handle critical comments, so this will probably come up in our discussions. I'm kind of worried that there's not necessarily an objective truth to how shallow/low-quality any particular criticism is, and I personally would prefer to err on the side of allowing more criticism. So it's possible that not much changes in the public discourse, and any interventions we do may need to be behind the scenes (such as our team spending more time talking with people who get criticized).

Notes on risk compensation

Rohin Shah1y2

Tbc if the preferences are written in words like "expected value of the lightcone" I agree it would be relatively easy to tell which was which, mainly by identifying community shibboleths. My claim is that if you just have the input/output mapping of (safety level of AI, capabilities level of AI) --> utility, then it would be challenging. Even longtermists should be willing to accept some risk, just because AI can help with other existential risks (and of course many safety researchers -- probably the majority at this point -- are not longtermists).

Notes on risk compensation

Rohin Shah1y4

What you call the "lab's" utility function isn't really specific to the lab; it could just as well apply to safety researchers. One might assume that the parameters would be set in such a way as to make the lab more C-seeking (e.g. it takes less C to produce 1 util for the lab than for everyone else).

But at least in the case of AI safety, I don't think this is the case. I doubt I could easily distinguish a lab capabilities researcher (or lab leadership, or some "aggregate lab utility function") from an external safety researcher if you just gave me their u... (read more)

trammell

Okay great, good to know. Again, my hope here is to present the logic of risk compensation in a way that makes it easy to make up your mind about how you think it applies in some domain, not to argue that it does apply in any domain. (And certainly not to argue that a model stripped down to the point that the only effect going on is a risk compensation effect is a realistic model of any domain!) As for the role of preference-differences in the AI risk case—if what you’re saying is that there’s no difference at all between capabilities researchers’ and safety researchers’ preferences (rather than just that the distributions overlap), that’s not my own intuition at all. I would think that if I learn * that two people have similar transhuamanist-ey preferences except that one discounts the distant future (or future generations), and so cares primarily about achieving amazing outcomes in the next few decades for people alive today, whereas the other cares primarily about the “expected value of the lightcone”; and * that one works on AI capabilities and the other works on AI safety, my guess about who was who would be a fair bit better than random. But I absolutely agree that epistemic disagreement is another reason, and could well be a bigger reason, why different people put different values on safety work relative to capabilities work. I say a few words about how this does / doesn’t change the basic logic of risk compensation in the section on "misperceptions": nothing much seems to change if the parties just disagree in a proportional way about the magnitude of the risk at any given levels of C and S--though this disagreement can change who prioritizes which kind of work, it doesn’t change how the risk compensation interaction plays out. What really changes things there is if the parties disagree about the effectiveness of marginal increases to S, or really, if they disagree about the extent to which increases to S decrease the extent to which increases to C low

EA "Worldviews" Need Rethinking

Rohin Shah1y6

I agree reductions in infant mortality likely have better long-run effects on capacity growth than equivalent levels of population growth while keeping infant mortality rates constant, which could mean that you still want to focus on infant mortality while not prioritizing increasing fertility.

I would just be surprised if the decision from the global capacity growth perspective ended up being "continue putting tons of resources into reducing infant mortality, but not much into increasing fertility" (which I understand to be the status quo for GHD), because... (read more)

80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)

Rohin Shah2y9

?? It's the second bullet point in the cons list, and reemphasized in the third bullet?

If you're saying "obviously this is the key determinant of whether you should work at a leading AI company so there shouldn't even be a pros / cons table", then obviously 80K disagrees given they recommend some such roles (and many other people (e.g. me) also disagree so this isn't 80K ignoring expert consensus). In that case I think you should try to convince 80K on the object level rather than applying political pressure.

Habryka [Deactivated]2y15

This thread feels like a fine place for people to express their opinion as a stakeholder.

Like, I don't even know how to engage with 80k staff on this on the object level, and seems like the first thing to do is to just express my opinion (and like, they can then choose to respond with argument).

80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)

Rohin Shah2y12

... That paragraph doesn't distinguish at all between OpenAI and, say, Anthropic. Surely you want to include some details specific to the OpenAI situation? (Or do your object-level views really not distinguish between them?)

Owen Cotton-Barratt

I was just disagreeing with Habryka's first paragraph. I'd definitely want to keep content along the lines of his third paragraph (which is pretty similar to what I initially drafted).

Updates on the EA catastrophic risk landscape

Rohin Shah2y7

There’s currently very little work going into issues that arise even if AI is aligned, including the deployment problem

The deployment problem (as described in that link) is a non-problem if you know that AI is aligned.

Analyzing the moral value of unaligned AIs

Rohin Shah2y2

In contrast, I think the fact that these AIs will be trained on human-generated data and deliberately shaped by humans to fulfill human-like functions and to be human-compatible should be given substantial weight.

... This seems to be saying that because we are aligning AI, they will be more utilitarian. But I thought we were discussing unaligned AI?

I agree that the fact we are aligning AI should make one more optimistic. Could you define what you mean by "unaligned AI"? It seems quite plausible that I will agree with your position, and think it amounts to ... (read more)

Matthew_Barnett

This overlooks my arguments in section 3, which were absolutely critical to forming my opinion here. My argument here can be summarized as follows: * The utilitarian arguments for technical alignment research seem weak, because AIs are likely to be conscious like us, and also share human moral concepts. * By contrast, technical alignment research seems clearly valuable if you care about humans who currently exist, since AIs will presumably be directly aligned to them. * However, pausing AI for alignment reasons seems pretty bad for humans who currently exist (under plausible models of the tradeoff). * I have sympathies to both utilitarianism and the view that current humans matter. The weak considerations favoring pausing AI on the utilitarian side don't outweigh the relatively much stronger and clearer arguments against pausing for currently existing humans. The last bullet point is a statement about my values. It is not a thesis independently of my values. I feel this was pretty explicit in the post. I'm not just saying "there are worlds in which alignment work is negative". I'm saying that it's fairly plausible. I'd say greater than 30% probability. Maybe higher than 40%. This seems perfectly sufficient to establish the position, which I argued explicitly, that the alternative position is "fairly weak". It would be different if I was saying "look out, there's a 10% chance you could be wrong". I'd agree that claim would be way less interesting. I don't think what I said resembles a motte-and-bailey, and I suspect you just misunderstood me. [ETA: Part of me feels like this statement is an acknowledgement that you fundamentally agree with me. You think the argument in favor of unaligned AIs being less utilitarian than humans is weak? Wasn't that my thesis? If you started at a prior of 50%, and then moved to 65% because of a weak argument, and then moved back to 60% because of my argument, then isn't that completely consistent with essentially every singl

Matthew_Barnett

Just a quick reply (I might reply more in-depth later but this is possibly the most important point): In my post I talked about the "default" alternative to doing lots of alignment research. Do you think that if AI alignment researchers quit tomorrow, engineers would stop doing RLHF etc. to their models? That they wouldn't train their AIs to exhibit human-like behaviors, or to be human-compatible? It's possible my language was misleading by giving an image of what unaligned AI looks like that isn't actually a realistic "default" in any scenario. But when I talk about unaligned AI, I'm simply talking about AI that doesn't share the preferences of humans (either its creator or the user). Crucially, humans are routinely misaligned in this sense. For example, employees don't share the exact preferences of their employer (otherwise they'd have no need for a significant wage). Yet employees are still typically docile, human-compatible, and assimilated to the overall culture. This is largely the picture I think we should imagine when we think about the "default" unaligned alternative, rather than imaging that humans will create something far more alien, far less docile, and therefore something with far less economic value. (As an aside, I thought this distinction wasn't worth making because I thought most readers would have already strongly internalized the idea that RLHF isn't "real alignment work". I suspect I was mistaken, and probably confused a ton of people.)

Analyzing the moral value of unaligned AIs

Rohin Shah2y6

This suggests affective empathy may not be strongly predictive of utilitarian motivations.

I can believe that if the population you are trying to predict for is just humans, almost all of whom have at least some affective empathy. But I'd feel pretty surprised if this were true in whatever distribution over unaligned AIs we're imagining. In particular, I think if there's no particular reason to expect affective empathy in unaligned AIs, then your prior on it being present should be near-zero (simply because there are lots of specific claims about unaligned ... (read more)

Matthew_Barnett

Here are a few (long, but high-level) comments I have before responding to a few specific points that I still disagree with: * I agree there are some weak reasons to think that humans are likely to be more utilitarian on average than unaligned AIs, for basically the reasons you talk about in your comment (I won't express individual agreement with all the points you gave that I agree with, but you should know that I agree with many of them). However, I do not yet see any strong reasons supporting your view. (The main argument seems to be: AIs will be different than us. You label this argument as strong but I think it is weak.) More generally, I think that if you're making hugely consequential decisions on the basis of relatively weak intuitions (which is what I believe many effective altruists do in this context), you should be very cautious. The lack of robust evidence for your position seems sufficient, in my opinion, for the main thesis of my original post to hold. (I think I was pretty careful in my language not to overstate the main claims.) * I suspect you may have an intuition that unaligned AIs will be very alien-like in certain crucial respects, but I predict this intuition will ultimately prove to be mistaken. In contrast, I think the fact that these AIs will be trained on human-generated data and deliberately shaped by humans to fulfill human-like functions and to be human-compatible should be given substantial weight. These factors make it quite likely, in my view, that the resulting AI systems will exhibit utilitarian tendencies to a significant degree, even if they do not share the preferences of either their users or their creators (for instance, I would guess that GPT-4 is already more utilitarian than the average human, in a meaningful sense). There is a strong selection pressure for AIs to display outward behaviors that are not overly alien-like. Indeed, the pressure seems to be for AIs to be inhumanly altruistic and kind in their

Analyzing the moral value of unaligned AIs

Rohin Shah2y4

Given my new understanding of the meaning of "contingent" here, I'd say my claims are:

I'm unsure about how contingent the development of utilitarianism in humans was. It seems quite plausible that it was not very historically contingent. I agree my toy model does not accurately capture my views on the contingency of total utilitarianism.
I'm also unsure how contingent it is for unaligned AI, but aggregating over my uncertainty suggests more contingent.

One way to think about this is to ask: why are any humans utilitarians? To the extent it's for reasons that... (read more)

Matthew_Barnett

Thanks for trying to better understand my views. I appreciate you clearly stating your reasoning in this comment, as it makes it easier for me to directly address your points and explain where I disagree. You argued that feeling pleasure and pain, as well as having empathy, are important factors in explaining why some humans are utilitarians. You suggest that to the extent these reasons for being utilitarian don't apply to unaligned AIs, we should expect it to be less likely for them to be utilitarians compared to humans. However, a key part of the first section of my original post was about whether unaligned AIs are likely to be conscious—which for the purpose of this discussion, seems roughly equivalent to whether they will feel pleasure and pain. I concluded that unaligned AIs are likely to be conscious for several reasons: 1. Consciousness seems to be a fairly convergent function of intelligence, as evidenced by the fact that octopuses are widely accepted to be conscious despite sharing almost no homologous neural structures with humans. This suggests consciousness arises somewhat robustly in sufficiently sophisticated cognitive systems. 2. Leading theories of consciousness from philosophy and cognitive science don't appear to predict that consciousness will be rare or unique to biological organisms. Instead, they tend to define consciousness in terms of information processing properties that AIs could plausibly share. 3. Unaligned AIs will likely be trained in environments quite similar to those that gave rise to human and animal consciousness—for instance, they will be trained on human cultural data and, in the case of robots, will interact with physical environments. The evolutionary and developmental pressures that gave rise to consciousness in biological organisms would thus plausibly apply to AIs as well. So in short, I believe unaligned AIs are likely to feel pleasure and pain, for roughly the reasons I think humans and animals do. Their consciousn

Analyzing the moral value of unaligned AIs

Rohin Shah2y2

I agree it's clear that you claim that unaligned AIs are plausibly comparably utilitarian as humans, maybe more.

What I didn't find was discussion of how contingent utilitarianism is in humans.

Though actually rereading your comment (which I should have done in addition to reading the post) I realize I completely misunderstood what you meant by "contingent", which explains why I didn't find it in the post (I thought of it as meaning "historically contingent"). Sorry for the misunderstanding.

Let me backtrack like 5 comments and retry again.

Analyzing the moral value of unaligned AIs

Rohin Shah2y2

I was arguing that trying to preserve the present generation of humans looks good according to (2), not (1).

I was always thinking about (1), since that seems like the relevant thing. When I agreed with you that generational value drift seems worrying, that's because it seems bad by (1). I did not mean to imply that I should act to maximize (2). I agree that if you want to act to maximize (2) then you should probably focus on preserving the current generation.

In my post, I fairly explicitly argued that the rough level of utilitarian values exhibited by huma

... (read more)

Matthew_Barnett

I'm baffled by your statement here. What did you think I was arguing when discussed whether "aligned AIs are more likely to have a preference for creating new conscious entities, furthering utilitarian objectives"? The conclusion of that section was that aligned AIs are plausibly not more likely to have such a preference, and therefore, human utilitarian preferences here are not "unusually high compared to other possibilities" (the relevant alternative possibility here being unaligned AI). This was a central part of my post that I discussed at length. The idea that unaligned AIs might be similarly utilitarian or even more so, compared to humans, was a crucial part of my argument. If indeed unaligned AIs are very likely to be less utilitarian than humans, then much of my argument in the first section collapses, which I explicitly acknowledged. I consider your statement here to be a valuable data point about how clear my writing was and how likely I am to get my ideas across to others who read the post. That said, I believe I discussed this point more-or-less thoroughly. ETA: Claude 3's summary of this argument in my post:

Analyzing the moral value of unaligned AIs

Rohin Shah2y4

Based on your toy model, my guess is that your underlying intuition is something like, "The fact that a tiny fraction of humans are utilitarian is contingent. If we re-rolled the dice, and sampled from the space of all possible human values again (i.e., the set of values consistent with high-level human moral concepts), it's very likely that <<1% of the world would be utilitarian, rather than the current (say) 1%."

No, this was purely to show why, from the perspective of someone with values, re-rolling those values would seem bad, as opposed to keepin... (read more)

Matthew_Barnett

I think there may have been a misunderstanding regarding the main point I was trying to convey. In my post, I fairly explicitly argued that the rough level of utilitarian values exhibited by humans is likely not very contingent, in the sense of being unusually high compared to other possibilities—and this was a crucial element of my thesis. This idea was particularly important for the section discussing whether unaligned AIs will be more or less utilitarian than humans. When you quoted me saying "humans are largely not utilitarians themselves," I intended this point to support the idea that our current rough level of utilitarianism is not contingent, rather than the opposite claim. In other words, I meant that the fact that humans are not highly utilitarian suggests that this level of utilitarianism is not unusual or contingent upon specific circumstances, and we might expect other intelligent beings, such as aliens or AIs, to exhibit similar, or even greater, levels of utilitarianism. Compare to the hypothetical argument: humans aren't very obsessed with building pyramids --> our current level of obsession with pyramid building is probably not unusual, in the sense that you might easily expect aliens/AIs to be similarly obsessed with building pyramids, or perhaps even more obsessed. (This argument is analogous because pyramids are simple structures that lots of different civilizations would likely stumble upon. Similarly, I think "try to create lots of good conscious experiences" is also a fairly simple directive, if indeed aliens/AIs/whatever are actually conscious themselves.) I think the question of whether utilitarianism is contingent or not matters significantly for our disagreement, particularly if you are challenging my post or the thesis I presented in the first section. If you are very uncertain about whether utilitarianism is contingent in the sense that is relevant to this discussion, then I believe that aligns with one of the main points I made in

Analyzing the moral value of unaligned AIs

Rohin Shah2y6

To the extent that future generations would have pretty different values than me, like "the only glory is in war and it is your duty to enslave your foes", along with the ability to enact their values on the reachable universe, in fact that would seem pretty bad to me.

However, I expect the correlation between my values and future generation values is higher than the correlation between my values and unaligned AI values, because I share a lot more background with future humans than with unaligned AI. (This doesn't require values to be innate, values can be ... (read more)

Matthew_Barnett

To clarify, I think it's a reasonable heuristic that, if you want to preserve the values of the present generation, you should try to minimize changes to the world and enforce some sort of stasis. This could include not building AI. However, I believe you may be glossing over the distinction between: (1) the values currently held by existing humans, and (2) a more cosmopolitan, utilitarian ethical value system. We can imagine a wide variety of changes to the world that would result in a vast changes to (1) without necessarily being bad according to (2). For example: * We could start doing genetic engineering of humans. * We could upload humans onto computers. * A human-level, but conscious, alien species could immigrate to Earth via a portal. In each scenario, I agree with your intuition that "the correlation between my values and future humans is higher than the correlation between my values and X-values, because I share much more background with future humans than with X", where X represents the forces at play in each scenario. However, I don't think it's clear that the resulting change to the world would be net negative from the perspective of an impartial, non-speciesist utilitarian framework. In other words, while you're introducing something less similar to us than future human generations in each scenario, it's far from obvious whether the outcome will be relatively worse according to utilitarianism. Based on your toy model, my guess is that your underlying intuition is something like, "The fact that a tiny fraction of humans are utilitarian is contingent. If we re-rolled the dice, and sampled from the space of all possible human values again (i.e., the set of values consistent with high-level human moral concepts), it's very likely that <<1% of the world would be utilitarian, rather than the current (say) 1%." If this captures your view, my main response is that it seems to assume a much narrower and more fragile conception of "cosmopolitan utilitar

Analyzing the moral value of unaligned AIs

Rohin Shah2y9

Fwiw I had a similar reaction as Ryan.

My framing would be: it seems pretty wild to think that total utilitarian values would be better served by unaligned AIs (whose values we don't know) rather than humans (where we know some are total utilitarians). In your taxonomy this would be "humans are more likely to optimize for goodness".

Let's make a toy model compatible with your position:

A short summary of my position is that unaligned AIs could be even more utilitarian than humans are, and this doesn't seem particularly unlikely either given that (1) humans ar

... (read more)

Matthew_Barnett

I'm curious: Does your reaction here similarly apply to ordinary generational replacement as well? Let me try to explain what I'm asking. We have a set of humans who exist right now. We know that some of them are utilitarians. At least one of them shares "Rohin's values". Similar to unaligned AIs, we don't know the values of the next generation of humans, although presumably they will continue to share our high-level moral concepts since they are human and will be raised in our culture. After the current generation of humans die, the next generation could have different moral values. As far as I can tell, the situation with regards to the next generation of humans is analogous to unaligned AI in the basic sense I've just laid out (mirroring the part of your comment I quoted). So, in light of that, would you similarly say that it's "pretty wild to think that total utilitarian values would be better served by a future generation of humans"? One possible answer here: "I'm not very worried about generational replacement causing moral values to get worse since the next generation will still be human." But if this is your answer, then you seem to be positing that our moral values are genetic and innate, rather than cultural, which is pretty bold, and presumably merits a defense. This position is IMO largely empirically ungrounded, although it depends on what you mean by "moral values". Another possible answer is: "No, I'm not worried about generational replacement because we've seen a lot of human generations already and we have lots of empirical data on how values change over time with humans. AI could be completely different." This would be a reasonable response, but as a matter of empirical fact, utilitarianism did not really culturally exist 500 or 1000 years ago. This indicates that it's plausibly quite fragile, in a similar way it might also be with AI. Of course, values drift more slowly with ordinary generational replacement compared to AI, but the phenomenon

EA "Worldviews" Need Rethinking

Rohin Shah2y4

Oh I see, sorry for misinterpreting you.

EA "Worldviews" Need Rethinking

Rohin Shah2y13

So I'm not really seeing anything "bad" here.

I didn't say your proposal was "bad", I said it wasn't "conservative".

My point is just that, if GHD were to reorient around "reliable global capacity growth", it would look very different, to the point where I think your proposal is better described as "stop GHD work, and instead do reliable global capacity growth work", rather than the current framing of "let's reconceptualize the existing bucket of work".

Richard Y Chappell🔸

I was replying to your sentence, "I'd guess most proponents of GHD would find (1) and (2) particularly bad."

EA "Worldviews" Need Rethinking

Rohin Shah2y84

I'll suggest a reconceptualization that may seem radical in theory but is conservative in practice.

It doesn't seem conservative in practice? Like Vasco, I'd be surprised if aiming for reliable global capacity growth would look like the current GHD portfolio. For example:

Given an inability to help everyone, you'd want to target interventions based on people's future ability to contribute. (E.g. you should probably stop any interventions that target people in extreme poverty.)
You'd either want to stop focusing on infant mortality, or start interventions to i

... (read more)

Linch

I'm not sure I buy this disjunctive claim. Many people over humanity's history have worked on reducing infant mortality (in technology, in policy, in direct aid, and in direct actions that prevent their own children/relatives' children from dying). While some people worked on this because they primarily intrinsically value reducing infant mortality, I think many others were inspired by the indirect effects. And taking the long view, reducing infant mortality clearly had long-run benefits that are different from (and likely better than) equivalent levels of population growth while keeping infant mortality rates constant.

David T2y15

I also think it misses the worldview bucket that's the main reason why many people fund global health and (some aspects of) development: intrinsic value attached to saving [human] lives. Potential positive flowthrough effects are a bonus on top of that, in most cases.

From an EA-ish hedonic utilitarianism perspective this dates right back to Singer's essay about saving a drowning child. Taking that thought experiment in a different direction, I don't think many people - EA or otherwise - would conclude that the decision on whether to save the child or not s... (read more)

Richard Y Chappell🔸

I guess I have (i) some different empirical assumptions, and (ii) some different moral assumptions (about what counts as a sufficiently modest revision to still count as "conservative", i.e. within the general spirit of GHD). To specifically address your three examples: 1. I'd guess that variance in cost (to save one life, or whatever) outweighs the variance in predictable ability to contribute. (iirc, Nick Beckstead's dissertation on longtermism made the point that all else equal, it would be better to save a life in a wealthy country for instrumental reasons, but that the cost difference is so great that it's still plausibly much better to focus on developing countries in practice.) Perhaps it would justify more of a shift towards the "D" side of "H&D", insofar as we could identify any good interventions for improving economic development. But the desire for lasting improvements seems commonsensical to many people anyway (compare all the rhetoric around "root causes", "teaching a man to fish", etc.) In general, extreme poverty might seem to have the most low-hanging fruit for improvement (including improvements to capacity-building). But there may be exceptions in cases of extreme societal dysfunction, in which case, again, I think it's pretty commonsensical that we shouldn't invest resources in places where they'd actually do less lasting good. 2. I don't understand at all why this would motivate less focus on infant mortality: fixing that is an extremely cheap way to improve human capacity! I think I already mentioned in the OP that increasing fertility could also be justified in principle, but I'm not aware of any proven cheap interventions that do this in practice. Adding some child benefit support (or whatever) into the mix doesn't strike me as unduly radical, in any case. 3. Greater support for education seems very commonsensical in principle (including from a broadly "global health & development" perspective), and iirc was an early f

Who's hiring? (Feb-May 2024)

Answer by Rohin ShahFeb 17, 202412

Research Scientist and Research Engineer roles in AI Safety and Alignment at Google DeepMind.

Location: Hybrid (3 days/week in the office) in San Francisco / Mountain View / London.

Application deadline: We don't have a final deadline yet, but will keep the roles open for at least another two weeks (i.e. until March 1, 2024), and likely longer.

For further details, see the roles linked above. You may also find my FAQ useful.

EV investigation into Owen and Community Health

Rohin Shah2y25

(Fyi, I probably won't engage more here, due to not wanting to spend too much time on this)

Jonas's comment is a high level assessment that is only useful insofar as you trust his judgment.

This is true, but I trust basically any random commenter a non-zero amount (unless their comment itself gives me reasons not to trust them). I agree you can get more trust if you know the person better. But even the amount of trust for "literally a random person I've never heard of" would be enough for the evidence to matter to me.

I'm only saying that I think large update

... (read more)

Elizabeth2y28

SBF was an EA leader in good standing for many years and had many highly placed friends. It's pretty notable to me that there weren't many comments like Jonas's for SBF, while there are for Owen.

I think these cases are too different for that comparison to hold.

One big difference is that SBF committed fraud, not sexual harassment. There's a long history of people minimizing sexual harassment, especially when it's as ambiguous. There's also a long history of ignoring fraud when you're benefiting from it, but by the time anyone had a chance to com... (read more)

EV investigation into Owen and Community Health

Rohin Shah2y59

The evidence Jonas provides is equally consistent with “Owen has a flaw he has healed” and “Owen is a skilled manipulator who charms men, and harasses women”.

Surely there are a lot of other hypotheses as well, and Jonas's evidence is relevant to updating on those?

More broadly, I don't think there's any obvious systemic error going on here. Someone who knows the person reasonably well, giving a model for what the causes of the behavior were, that makes predictions about future instances, clearly seems like evidence one should take into account.

(I do agree t... (read more)

Elizabeth2y22

Surely there are a lot of other hypotheses as well, and Jonas's evidence is relevant to updating on those?

There are of course infinite hypotheses. But I don't think Jonas's statement adds much to my estimates of how much harm Owen is likely to do in the future, and expect the same should be true for most people reading this.

To be clear I'm not saying I estimate more harm is likely- taking himself off the market seems likely to work, and this has been public enough I expect it to be easy for future victims to complain if something does happen. I'm onl... (read more)

AI Pause Will Likely Backfire

Rohin Shah2y12

Yeah, I don't think it's accurate to say that I see assistance games as mostly irrelevant to modern deep learning, and I especially don't think that it makes sense to cite my review of Human Compatible to support that claim.

The one quote that Daniel mentions about shifting the entire way we do AI is a paraphrase of something Stuart says, and is responding to the paradigm of writing down fixed, programmatic reward functions. And in fact, we have now changed that dramatically through the use of RLHF, for which a lot of early work was done at CHAI, so I think... (read more)

Downsides of Small Organizations in EA

Rohin Shah3y14

Fyi, the list you linked doesn't contain most of what I would consider the "small" orgs in AI, e.g. off the top of my head I'd name ARC, Redwood Research, Conjecture, Ought, FAR AI, Aligned AI, Apart, Apollo, Epoch, Center for AI Safety, Bluedot, Ashgro, AI Safety Support and Orthogonal. (Some of these aren't even that small.) Those are the ones I'd be thinking about if I were to talk about merging orgs.

Maybe the non-AI parts of that list are more comprehensive, but my guess is that it's just missing most of the tiny orgs that OP is talking about (e.g. OP'... (read more)

Angelina Li

Yeah, fair! It's frustratingly hard to get comprehensive lists of EA orgs (it's hard to be in the business of gatekeeping what 'EA-affiliated' is). I did a 5 min search for the best publicly available list and then gave up; sometimes I use the list of organizations with representatives at the last EAG for this use case. Maybe within AI specifically, someone could repeat this exercise with something like this list. If someone knows of a better public list of EA orgs, I'd love to know about it :)

Critiques of prominent AI safety labs: Conjecture

Rohin Shah3y6

:) I'm glad we got to agreement!

(Or at least significantly closer, I'm sure there are still some minor differences.)

Critiques of prominent AI safety labs: Conjecture

Rohin Shah3y32

On hits-based research: I certainly agree there are other factors to consider in making a funding decision. I'm just saying that you should talk about those directly instead of criticizing the OP for looking at whether their research was good or not.

(In your response to OP you talk about a positive case for the work on simulators, SVD, and sparse coding -- that's the sort of thing that I would want to see, so I'm glad to see that discussion starting.)

On VCs: Your position seems reasonable to me (though so does the OP's position).

On recommendations: Fwiw I ... (read more)

mariushobbhahn3y38

Hmm, yeah. I actually think you changed my mind on the recommendations. My new position is something like:
1. There should not be a higher burden on anti-recommendations than pro-recommendations.
2. Both pro- and anti-recommendations should come with caveats and conditionals whenever they make a difference to the target audience.
3. I'm now more convinced that the anti-recommendation of OP was appropriate.
4. I'd probably still phrase it differently than they did but my overall belief went from "this was unjustified" to "they should have used diffe... (read more)

Critiques of prominent AI safety labs: Conjecture

Rohin Shah3y139

I'm not very compelled by this response.

It seems to me you have two points on the content of this critique. The first point:

I think it's bad to criticize labs that do hits-based research approaches for their early output (I also think this applies to your critique of Redwood) because the entire point is that you don't find a lot until you hit.

I'm pretty confused here. How exactly do you propose that funding decisions get made? If some random person says they are pursuing a hits-based approach to research, should EA funders be obligated to fund them?

Presuma... (read more)

richard_ngo3y11

Good comment, consider cross-posting to LW?

mariushobbhahn

1. Meta: maybe my comment on the critique reads stronger than intended (see comment with clarifications) and I do agree with some of the criticisms and some of the statements you made. I'll reflect on where I should have phrased things differently and try to clarify below. 2. Hits-based research: Obviously results are one evaluation criterion for scientific research. However, especially for hits-based research, I think there are other factors that cannot be neglected. To give a concrete example, if I was asked whether I should give a unit under your supervision $10M in grant funding or not, I would obviously look back at your history of results but a lot of my judgment would be based on my belief in your ability to find meaningful research directions in the future. To a large extent, the funding would be a bet on you and the research process you introduce in a team and much less on previous results. Obviously, your prior research output is a result of your previous process but especially in early organizations this can diverge quite a bit. Therefore, I think it is fair to say that both a) the output of Conjecture so far has not been that impressive IMO and b) I think their updates to early results to iterate faster and look for more hits actually is positive evidence about their expected future output. 3. Of course, VCs are interested in making money. However, especially if they are angel investors instead of institutional VCs, ideological considerations often play a large role in their investments. In this case, the VCs I'm aware of (not all of which are mentioned in the post and I'm not sure I can share) actually seem fairly aligned for VC standards to me. Furthermore, the way I read the critique is something like "Connor didn't tell the VCs about the alignment plans or neglects them in conversation". However, my impression from conversation with (ex-) staff was that Connor was very direct about their motives to reduce x-risks. I think it's clear that product

Linch's Quick takes

Rohin Shah3y2

Wait, you think the reason we can't do brain improvement is because we can't change the weights of individual neurons?

That seems wrong to me. I think it's because we don't know how the neurons work.

Did you read the link to Cold Takes above? If so, where do you disagree with it?

(I agree that we'd be able to do even better if we knew how the neurons work.)

Similarly I'd be surprised if you thought that beings as intelligent as humans could recursively improve NNs. Cos currently we can't do that, right?

Humans can improve NNs? That's what AI capabilities resear... (read more)

Linch's Quick takes

Rohin Shah3y2

I think it's within the power of beings equally as intelligent as us (similarly as mentioned above I think recursive improvement in humans would accelerate if we had similar abilities).

Nathan Young

Wait, you think the reason we can't do brain improvement is because we can't change the weights of individual neurons? That seems wrong to me. I think it's because we don't know how the neurons work. Similarly I'd be surprised if you thought that beings as intelligent as humans could recursively improve NNs. Cos currently we can't do that, right?

A freshman year during the AI midgame: my approach to the next year

Rohin Shah3y6

I thought yes, but I'm a bit unhappy about that assumption (I forgot it was there). If you go by the intended spirit of the assumption (see the footnote) I'm probably on board, but it seems ripe for misinterpretation ("well if you had just deployed GPT-5 it really could have run an automated company, even though in practice we didn't do that because we were worried about safety and/or legal liability and/or we didn't know how to prompt it etc").

A freshman year during the AI midgame: my approach to the next year

Rohin Shah3y14

You could look at these older conversations. There's also Where I agree and disagree with Eliezer (see also my comment) though I suspect that won't be what you're looking for.

Mostly though I think you aren't going to get what you're looking for because it's a complicated question that doesn't have a simple answer.

(I think this regardless of whether you frame the question as "do we die?" or "do we live?", if you think the case for doom is straightforward I think you are mistaken. All the doom arguments I know of seem to me like they establish plausibility, ... (read more)

Ben_West🔸

Pedantic, but are you using the bio anchors definition? ("software which causes a tenfold acceleration in the rate of growth of the world economy (assuming that it is used everywhere that it would be economically profitable to use it)")

Greg_Colbourn ⏸️

Thanks. Regarding the conversations from 2019, I think we are in a different world now (post GPT-4 + AutoGPT/plugins). [Paul Christiano] "Perhaps there's no problem at all" - saying this really doesn't help! I want to know why might that be the case! "concerted effort by longtermists could reduce it" - seems less likely now given shorter timelines. "finding out that the problem is impossible can help; it makes it more likely that we can all coordinate to not build dangerous AI systems" - this could be a way out, but again, little time. We need a Pause first to have time to firmly establish impossibility. However, "coordinate to not build dangerous AI systems" is not part of p(non-doom|AGI) [I'm interested in why people think there won't be doom, given we get AGI]. So far, Paul's section does basically nothing to update me on p(doom|AGI). [Rohin Shah] "A likely crux is that I think that the ML community will actually solve the problems, as opposed to applying a bandaid fix that doesn't scale." - yes, this is a crux for me. How do the fixes scale, with 0 failure modes in the limit of superintelligence? You mention interpretability as a basis for scalable AI-assisted alignment above this, but progress in interpretability remains far behind the scaling of the models, so doesn't hold much hope imo. "I'm also less worried about race dynamics increasing accident risk"; "the Nash equilibrium is for all agents to be cautious" - I think this has been blown out of the water with the rush to connect GPT-4 to the internet and spread it far and wide as quickly as possible. As I said, we're in a different world now. "If I condition on discontinuous takeoff... I... get a lot more worried about AI risk" - this also seems cruxy (and I guess we've discussed a bit above). What do you think the likelihood is of model trained with 100x more compute (affordable by Microsoft or Google) being able to do AI Research Engineering as well as the median AI Research Engineer? To me it seems pret

A freshman year during the AI midgame: my approach to the next year

Rohin Shah3y14

First off, let me say that I'm not accusing you specifically of "hype", except inasmuch as I'm saying that for any AI-risk-worrier who has ever argued for shorter timelines (a class which includes me), if you know nothing else about that person, there's a decent chance their claims are partly "hype". Let me also say that I don't believe you are deliberately benefiting yourself at others' expense.

That being said, accusations of "hype" usually mean an expectation that the claims are overstated due to bias. I don't really see why it matters if the bias is sur... (read more)

Greg_Colbourn ⏸️

I guess you're right that "hype" here could also come from being survival motivated. But surely the easier option is to just stop worrying so much? (I mean, it's not like stress doesn't have health effects). Read the best counter-arguments and reduce your p(doom) accordingly. Unfortunately, I haven't seen any convincing counterarguments. I'm with Richard Ngo here when he says: What are the best counter-arguments you are aware of? I'm always a bit confused by people saying they have a p(doom|TAI) of 1-10%: like what is the mechanistic reason for expecting that the default, or bulk of the probability mass, is not doom? How is the (transformative) AI spontaneously becoming aligned enough to be safe!? It often reads to me as people (who understand the arguments for x-risk) wanting to sound respectable and not alarmist, rather than actually having a good reason to not worry so much. GPT-5 or GPT-6 (1 or 2 further generations of large AI model development). Yes, TAI, or PASTA, or AI that can do everything as good as the best humans (including AI Research Engineering). Would you be willing to put this in numerical form (% chance) as a rough expectation?

Two contrasting models of “intelligence” and future growth

Rohin Shah3y4

I don't yet understand why you believe that hardware scaling would come to grow at much higher rates than it has in the past.

If we assume innovations decline, then it is primarily because future AI and robots will be able to automate far more tasks than current AI and robots (and we will get them quickly, not slowly).

Imagine that currently technology A that automates area X gains capabilities at a rate of 5% per year, which ends up leading to a growth rate of 10% per year.

Imagine technology B that also aims to automate area X gains capabilities at a rate o... (read more)

Two contrasting models of “intelligence” and future growth

Rohin Shah3y6

I don't disagree with any of the above (which is why I emphasized that I don't think the scaling argument is sufficient to justify a growth explosion). I'm confused why you think the rate of growth of robots is at all relevant, when (general-purpose) robotics seem mostly like a research technology right now. It feels kind of like looking at the current rate of growth of fusion plants as a prediction of the rate of growth of fusion plants after the point where fusion is cheaper than other sources of energy.

(If you were talking about the rate of growth of machines in general I'd find that more relevant.)

Magnus Vinding

By "I am confused by your argument against scaling", I thought you meant the argument I made here, since that was the main argument I made regarding scaling; the example with robots wasn't really central. I'm also a bit confused, because I read your arguments above as being arguments in favor of explosive economic growth rates from hardware scaling and increasing software efficiency. So I'm not sure whether you believe that the factors mentioned in your comment above are sufficient for causing explosive economic growth. Moreover, I don't yet understand why you believe that hardware scaling would come to grow at much higher rates than it has in the past.

Magnus Vinding

To be clear, I don't mean to claim that we should give special importance to current growth rates in robotics in particular. I just picked that as an example. But I do think it's a relevant example, primarily due to the gradual nature of the abilities that robots are surpassing, and the consequent gradual nature of their employment. Unlike fusion, which is singular in its relevant output (energy), robots produce a diversity of things, and robots cover a wide range of growth-relevant skills that are gradually getting surpassed already. It is this gradual nature of their growth-related abilities that makes them relevant, imo — because they are already doing a lot of work and already contributing a fair deal to the growth we're currently seeing. (To clarify, I mostly have in mind industrial robots, such as these, the future equivalents of which I also expect to be important to growth; I'd agree that it wouldn't be so relevant if we were only talking about some prototypes of robots that don't yet contribute meaningfully to the economy.)

Two contrasting models of “intelligence” and future growth

Rohin Shah3y9

I am confused by your argument against scaling.

My understanding of the scale-up argument is:

Currently humans are state-of-the-art at various tasks relevant to growth.
We are bottlenecked on scaling up humans by a variety of things (e.g. it takes ~20 years to train up a new human, you can't invest money into the creation of new humans with the hope of getting a return on it, humans only work ~8 hours a day)
At some point AI / robots will be able to match human performance at these tasks.
AI / robots will not be bottlenecked on those things.

In some sense I agre... (read more)

Magnus Vinding3y10

I agree with premise 3. Where I disagree more comes down to the scope of premise 1.

This relates to the diverse class of contributors and bottlenecks to growth under Model 2. So even though it's true to say that humans are currently "the state-of-the-art at various tasks relevant to growth", it's also true to say that computers and robots are currently "the state-of-the-art at various tasks relevant to growth". Indeed, machines/external tools have been (part of) the state-of-the-art at some tasks for millennia (e.g. in harvesting), and computers and robots ... (read more)

A freshman year during the AI midgame: my approach to the next year

Rohin Shah3y25

“Hype” typically means Person X is promoting a product, that they benefit from the success of that product, and that they are probably exaggerating the impressiveness of that product in bad faith (or at least, with a self-serving bias).

All of this seems to apply to AI-risk-worriers?

AI-risk-worriers are promoting a narrative that powerful AI will come soon
AI-risk-worriers are taken more seriously, have more job opportunities, get more status, get more of their policy proposals, etc, to the extent that this narrative is successful
My experience is that

... (read more)

Greg_Colbourn ⏸️

FWIW I am not seeking job opportunities or policy proposals that favour me financially. Rather - policy proposals that keep me, my family, and everyone else alive. My self-interest here is merely in staying alive (and wanting the rest of the planet to stay alive too). I'd rather this wasn't an issue and just enjoy my retirement. I want to spend money on this (pay for people to work on Pause / global AGI moratorium / Shut It Down campaigns). Status is a trickier thing to untangle. I'd be lying, as a human, if I said I didn't care about it. But I'm not exactly getting much here by being an "AI-risk-worrier". And I could probably get more doing something else. No one is likely to thank me if a disaster doesn't happen. Re AI products being less impressive than the impression you get from AI-risk-worriers, what do you make of Connor Leahy's take that LLMs are basically "general cognition engines" and will scale to full AGI in a generation or two (and with the addition of various plugins etc to aid "System 2" type thinking, which are freely being offered by the AutoGPT crowd)?

Steven Byrnes

Hmm. Touché. I guess another thing on my mind is the mood of the hype-conveyer. My stereotypical mental image of “hype” involves Person X being positive & excited about the product they’re hyping, whereas the imminent-doom-ers that I’ve talked to seem to have a variety of moods including distraught, pissed, etc. (Maybe some are secretly excited too? I dunno; I’m not very involved in that community.)

Sharmake

This. I generally also agree with your 3 observations, and the reason I was focusing on truth seeking is because my epistemic environment tends to reward worrying AI claims more than it probably should due to negativity bias, as well as looking at AI Twitter hype.

Two contrasting models of “intelligence” and future growth

Rohin Shah3y10

Thanks for this, it's helpful. I do agree that declining growth rates is significant evidence for your view.

I disagree with your other arguments:

For one, an AI-driven explosion of this kind would most likely involve a corresponding explosion in hardware (e.g. for reasons gestured at here and here), and there are both theoretical and empirical reasons to doubt that we will see such an explosion.

I don't have a strong take on whether we'll see an explosion in hardware efficiency; it's plausible to me that there won't be much change there (and also plausible t... (read more)

Magnus Vinding

Regarding explosive growth in the amount of hardware: I meant to include the scale aspect as well when speaking of a hardware explosion. I tried to outline one of the main reasons I'm skeptical of such an 'explosion via scaling' here. In short, in the absence of massive efficiency gains, it seems even less likely that we will see a scale-up explosion in the future. That's right, but that's consistent with the per capita drop in innovation being a significant part of the reason why growth rates gradually declined since the 1960s. I didn't mean to deny that total population size has played a crucial role, as it obviously has and does. But if innovations per capita continue to decline, then even a significant increase in effective population size in the future may not be enough to cause a growth explosion. For example, if the number of employed robots continues to grow at current rates (roughly 12 percent per year), and if future robots eventually come to be the relevant economic population, then declining rates of innovation/economic productivity per capita would mean that the total economic growth rate still doesn't exceed 12 percent. I realize that you likely expect robot populations to grow much faster in such a future, but I still don't see what would drive such explosive growth in hardware (even if, in fact especially if, it primarily involves scaling-based growth). That makes sense. On the other hand, it's perhaps worth noting that individual human thinking was increasingly extended by computers after ca. 1950, and yet the rate of innovation per capita still declined. So in that sense, the decline in progress could be seen as being somewhat understated by the graphs, in that the rate of innovation per dollar/scientific instrument/computation/etc. has declined considerably more.

There are no coherence theorems

Rohin Shah3y3

I think it does [change the conclusion].

Upon rereading I realize I didn't state this explicitly, but my conclusion was the following:

If an agent has complete preferences, and it does not pursue dominated strategies, then it must be representable as maximizing expected utility.

Transitivity depending on completeness doesn't invalidate that conclusion.

Elliott Thornley (EJT)

Ah I see! Yep, agree with that.

There are no coherence theorems

Rohin Shah3y5

Okay, it seems like we agree on the object-level facts, and what's left is a disagreement about whether people have been making a major error. I'm less interested in that disagreement so probably won't get into a detailed discussion, but I'll briefly outline my position here.

The error is claiming that
There exist theorems which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy.
I haven't seen anyone point out that that claim

... (read more)

Elliott Thornley (EJT)

I think that’s right. Yep, I agree with all of this. Often, but not in this case. If authors understood the above points and meant to refer to the Complete Class Theorem, they need only have said: * If an agent has complete, transitive preferences, and it does not pursue dominated strategies, then it must be representable as maximizing expected utility. (And they probably wouldn’t have mentioned Cox, Savage, etc.) I think it does. If the money-pump for transitivity needs Completeness, and Completeness is doubtful, then the money-pump for transitivity is doubtful too.