Oh sorry, I missed the weights on the factors, and thought you were taking an unweighted average.
Is it because you have to run large evals or do pre-training runs? Do you think this argument applies to all areas of capabilities research?
All tasks in capabilities are ultimately trying to optimize the capability-cost frontier, which usually benefits from measuring capability.
If you have an AI that will do well at most tasks you give it that take (say) a week, then you have the problem that the naive way of evaluating the AI (run it on some difficult tasks an...
Great analysis of factors impacting automatability.
Looking at your numbers though, I feel like you didn't really need this; you could have just said "I think scheming risk is by far the most important factor in automatability of research areas, therefore capabilities will come first". EDIT Overstated, I missed the fact that scheming risk factor had lower weight than the others.
I don't agree with that conclusion for two main reasons:
I don’t know how much the FTX collapse is responsible for our current culture. They did cause unbelievable damage, acting extremely unethically and unilaterally and recklessly in destructive ways. But they did have this world-scale ambition, and urgency, and proclivity to actually make things happen in the world, that I think central EA orgs and the broader EA community sorely lack in light of the problems we’re hoping to solve.Â
But this is exactly why I don't want to encourage heroic responsibility (despite the fact that I often take on that mindset ...
While I really like the HPMOR quote, I don't really resonate with heroic responsibility, and don't resonate with the "Everything is my fault" framing. Responsibility is a helpful social coordination tool, but it doesn't feel very "real" to me. I try to take the most helpful/impactful actions, even if they don't seem like "my responsibility" (while being cooperative and not unilateral and with reasonable constraints).Â
I'm sympathetic to taking on heroic responsibility causing harm in certain cases, but I don't see strong enough evidence that it causes ...
In fact, all of the top 7 most sought-after skills were related to management or communications.
"Leadership / strategy" and "government and policy expertise" are emphatically not management or communications. There's quite a lot of effort on building a talent pipeline for "government and policy expertise". There isn't one for "leadership / strategy" but I think that's mostly because no one knows how to do it well (broadly speaking, not just limited to EA).
If you want to view things through the lens of status (imo often a mistake), I think "leadership / str...
I think that most of classic EA vs the rest of the world is a difference in preferences / values, rather than a difference in beliefs.
I somewhat disagree but I agree this is plausible. (That was more of a side point, maybe I shouldn't have included it.)
most people really really don't want to die in the next ten years
Is your claim that they really really don't want to die in the next ten years, but they are fine dying in the next hundred years? (Else I don't see how you're dismissing the anti-aging vs sports team example.)
...So, for x-risk to be high, many peo
Most people really don’t want to die, or to be disempowered in their lifetimes. So, for existential risk to be high, there has to be some truly major failure of rationality going on.Â
... What is surprising about the world having a major failure of rationality? That's the default state of affairs for anything requiring a modicum of foresight. A fairly core premise of early EA was that there is a truly major failure of rationality going on in the project of trying to improve the world.
Are you surprised that ordinary people spend more money and time on, ...
I think that most of classic EA vs the rest of the world is a difference in preferences / values, rather than a difference in beliefs. Ditto for someone funding their local sports teams rather than anti-aging research. Â We're saying that people are failing in the project of rationally trying to improve the world by as much as possible - but few people really care much or at all about succeeding at that project. (If they cared more, GiveWell would be moving a lot more money than it is.)
In contrast, most people really really don't want to die in the nex...
If you think the following claim is true - 'non-AI projects are never undercut but always outweighed'
Of course I don't think this. AI definitely undercuts some non-AI projects. But "non-AI projects are almost always outweighed in importance" seems very plausible to me, and I don't see why anything in the piece is a strong reason to disbelieve that claim, since this piece is only responding to the undercutting argument. And if that claim is true, then the undercutting point doesn't matter.
We are disputing a general heuristic that privileges the AI cause area and writes off all the others.
I think the most important argument towards this conclusion is "AI is a big deal, so we should prioritize work that makes it go better". But it seems you have placed this argument out of scope:
...[The claim we are interested in is] that the coming AI revolution undercuts the justification for doing work in other cause areas, rendering work in those areas useless, or nearly so (for now, and perhaps forever).
[...]
AI causes might be more cost-effective than
I agree with some of the points on point 1, though other than FTX, I don't think the downside risk of any of those examples is very large
Fwiw I find it pretty plausible that lots of political action and movement building for the sake of movement building has indeed had a large negative impact, such that I feel uncertain about whether I should shut it all down if I had the option to do so (if I set aside concerns like unilateralism). I also feel similarly about particular examples of AI safety research but definitely not for the field as a whole.
...Agree that
I'm not especially pro-criticism but this seems way overstated.
Almost all EA projects have low downside risk in absolute terms
I might agree with this on a technicality, in that depending on your bar or standard, I could imagine agreeing that almost all EA projects (at least for more speculative causes) have negligible impact in absolute terms.
But presumably you mean that almost all EA projects are such that their plausible good outcomes are way bigger in magnitude than their plausible bad outcomes, or something like that. This seems false, e.g.
Of course, it's true that they could ignore serious criticism is they wanted to, but my sense is that people actually quite often feel unable to ignore criticism.
As someone sympathetic to many of Habryka's positions, while also disagreeing with many of Habryka's positions, my immediate reaction to this was "well that seems like a bad thing", c.f.
shallow criticism often gets valorized
I'd feel differently if you had said "people feel obliged to take criticism seriously if it points at a real problem" or something like that, but I agree with you that the mech...
Tbc if the preferences are written in words like "expected value of the lightcone" I agree it would be relatively easy to tell which was which, mainly by identifying community shibboleths. My claim is that if you just have the input/output mapping of (safety level of AI, capabilities level of AI) --> utility, then it would be challenging. Even longtermists should be willing to accept some risk, just because AI can help with other existential risks (and of course many safety researchers -- probably the majority at this point -- are not longtermists).
What you call the "lab's" utility function isn't really specific to the lab; it could just as well apply to safety researchers. One might assume that the parameters would be set in such a way as to make the lab more C-seeking (e.g. it takes less C to produce 1 util for the lab than for everyone else).
But at least in the case of AI safety, I don't think this is the case. I doubt I could easily distinguish a lab capabilities researcher (or lab leadership, or some "aggregate lab utility function") from an external safety researcher if you just gave me their u...
I agree reductions in infant mortality likely have better long-run effects on capacity growth than equivalent levels of population growth while keeping infant mortality rates constant, which could mean that you still want to focus on infant mortality while not prioritizing increasing fertility.
I would just be surprised if the decision from the global capacity growth perspective ended up being "continue putting tons of resources into reducing infant mortality, but not much into increasing fertility" (which I understand to be the status quo for GHD), because...
?? It's the second bullet point in the cons list, and reemphasized in the third bullet?
If you're saying "obviously this is the key determinant of whether you should work at a leading AI company so there shouldn't even be a pros / cons table", then obviously 80K disagrees given they recommend some such roles (and many other people (e.g. me) also disagree so this isn't 80K ignoring expert consensus). In that case I think you should try to convince 80K on the object level rather than applying political pressure.
There’s currently very little work going into issues that arise even if AI is aligned, including the deployment problem
The deployment problem (as described in that link) is a non-problem if you know that AI is aligned.
In contrast, I think the fact that these AIs will be trained on human-generated data and deliberately shaped by humans to fulfill human-like functions and to be human-compatible should be given substantial weight.
... This seems to be saying that because we are aligning AI, they will be more utilitarian. But I thought we were discussing unaligned AI?
I agree that the fact we are aligning AI should make one more optimistic. Could you define what you mean by "unaligned AI"? It seems quite plausible that I will agree with your position, and think it amounts to ...
This suggests affective empathy may not be strongly predictive of utilitarian motivations.
I can believe that if the population you are trying to predict for is just humans, almost all of whom have at least some affective empathy. But I'd feel pretty surprised if this were true in whatever distribution over unaligned AIs we're imagining. In particular, I think if there's no particular reason to expect affective empathy in unaligned AIs, then your prior on it being present should be near-zero (simply because there are lots of specific claims about unaligned ...
Given my new understanding of the meaning of "contingent" here, I'd say my claims are:
One way to think about this is to ask: why are any humans utilitarians? To the extent it's for reasons that...
I agree it's clear that you claim that unaligned AIs are plausibly comparably utilitarian as humans, maybe more.
What I didn't find was discussion of how contingent utilitarianism is in humans.
Though actually rereading your comment (which I should have done in addition to reading the post) I realize I completely misunderstood what you meant by "contingent", which explains why I didn't find it in the post (I thought of it as meaning "historically contingent"). Sorry for the misunderstanding.
Let me backtrack like 5 comments and retry again.
I was arguing that trying to preserve the present generation of humans looks good according to (2), not (1).
I was always thinking about (1), since that seems like the relevant thing. When I agreed with you that generational value drift seems worrying, that's because it seems bad by (1). I did not mean to imply that I should act to maximize (2). I agree that if you want to act to maximize (2) then you should probably focus on preserving the current generation.
...In my post, I fairly explicitly argued that the rough level of utilitarian values exhibited by huma
Based on your toy model, my guess is that your underlying intuition is something like, "The fact that a tiny fraction of humans are utilitarian is contingent. If we re-rolled the dice, and sampled from the space of all possible human values again (i.e., the set of values consistent with high-level human moral concepts), it's very likely that <<1% of the world would be utilitarian, rather than the current (say) 1%."
No, this was purely to show why, from the perspective of someone with values, re-rolling those values would seem bad, as opposed to keepin...
To the extent that future generations would have pretty different values than me, like "the only glory is in war and it is your duty to enslave your foes", along with the ability to enact their values on the reachable universe, in fact that would seem pretty bad to me.
However, I expect the correlation between my values and future generation values is higher than the correlation between my values and unaligned AI values, because I share a lot more background with future humans than with unaligned AI. (This doesn't require values to be innate, values can be ...
Fwiw I had a similar reaction as Ryan.
My framing would be: it seems pretty wild to think that total utilitarian values would be better served by unaligned AIs (whose values we don't know) rather than humans (where we know some are total utilitarians). In your taxonomy this would be "humans are more likely to optimize for goodness".
Let's make a toy model compatible with your position:
...A short summary of my position is that unaligned AIs could be even more utilitarian than humans are, and this doesn't seem particularly unlikely either given that (1) humans ar
So I'm not really seeing anything "bad" here.
I didn't say your proposal was "bad", I said it wasn't "conservative".
My point is just that, if GHD were to reorient around "reliable global capacity growth", it would look very different, to the point where I think your proposal is better described as "stop GHD work, and instead do reliable global capacity growth work", rather than the current framing of "let's reconceptualize the existing bucket of work".
I'll suggest a reconceptualization that may seem radical in theory but is conservative in practice.
It doesn't seem conservative in practice? Like Vasco, I'd be surprised if aiming for reliable global capacity growth would look like the current GHD portfolio. For example:
I also think it misses the worldview bucket that's the main reason why many people fund global health and (some aspects of) development: intrinsic value attached to saving [human] lives. Potential positive flowthrough effects are a bonus on top of that, in most cases.
From an EA-ish hedonic utilitarianism perspective this dates right back to Singer's essay about saving a drowning child. Taking that thought experiment in a different direction, I don't think many people - EA or otherwise - would conclude that the decision on whether to save the child or not s...
Research Scientist and Research Engineer roles in AI Safety and Alignment at Google DeepMind.
Location: Hybrid (3 days/week in the office) in San Francisco / Mountain View / London.
Application deadline: We don't have a final deadline yet, but will keep the roles open for at least another two weeks (i.e. until March 1, 2024), and likely longer.
For further details, see the roles linked above. You may also find my FAQ useful.
(Fyi, I probably won't engage more here, due to not wanting to spend too much time on this)
Jonas's comment is a high level assessment that is only useful insofar as you trust his judgment.
This is true, but I trust basically any random commenter a non-zero amount (unless their comment itself gives me reasons not to trust them). I agree you can get more trust if you know the person better. But even the amount of trust for "literally a random person I've never heard of" would be enough for the evidence to matter to me.
...I'm only saying that I think large update
SBF was an EA leader in good standing for many years and had many highly placed friends. It's pretty notable to me that there weren't many comments like Jonas's for SBF, while there are for Owen.
Â
I think these cases are too different for that comparison to hold.Â
One big difference is that SBF committed fraud, not sexual harassment. There's a long history of people minimizing sexual harassment, especially when it's as ambiguous. There's also a long history of ignoring fraud when you're benefiting from it, but by the time anyone had a chance to com...
The evidence Jonas provides is equally consistent with “Owen has a flaw he has healed” and “Owen is a skilled manipulator who charms men, and harasses women”.
Surely there are a lot of other hypotheses as well, and Jonas's evidence is relevant to updating on those?
More broadly, I don't think there's any obvious systemic error going on here. Someone who knows the person reasonably well, giving a model for what the causes of the behavior were, that makes predictions about future instances, clearly seems like evidence one should take into account.
(I do agree t...
Surely there are a lot of other hypotheses as well, and Jonas's evidence is relevant to updating on those?
Â
There are of course infinite hypotheses. But I don't think Jonas's statement adds much to my estimates of how much harm Owen is likely to do in the future, and expect the same should be true for most people reading this.
To be clear I'm not saying I estimate more harm is likely- taking himself off the market seems likely to work, and this has been public enough I expect it to be easy for future victims to complain if something does happen. I'm onl...
Yeah, I don't think it's accurate to say that I see assistance games as mostly irrelevant to modern deep learning, and I especially don't think that it makes sense to cite my review of Human Compatible to support that claim.
The one quote that Daniel mentions about shifting the entire way we do AI is a paraphrase of something Stuart says, and is responding to the paradigm of writing down fixed, programmatic reward functions. And in fact, we have now changed that dramatically through the use of RLHF, for which a lot of early work was done at CHAI, so I think...
Fyi, the list you linked doesn't contain most of what I would consider the "small" orgs in AI, e.g. off the top of my head I'd name ARC, Redwood Research, Conjecture, Ought, FAR AI, Aligned AI, Apart, Apollo, Epoch, Center for AI Safety, Bluedot, Ashgro, AI Safety Support and Orthogonal. (Some of these aren't even that small.) Those are the ones I'd be thinking about if I were to talk about merging orgs.
Maybe the non-AI parts of that list are more comprehensive, but my guess is that it's just missing most of the tiny orgs that OP is talking about (e.g. OP'...
On hits-based research: I certainly agree there are other factors to consider in making a funding decision. I'm just saying that you should talk about those directly instead of criticizing the OP for looking at whether their research was good or not.
(In your response to OP you talk about a positive case for the work on simulators, SVD, and sparse coding -- that's the sort of thing that I would want to see, so I'm glad to see that discussion starting.)
On VCs: Your position seems reasonable to me (though so does the OP's position).
On recommendations: Fwiw I ...
Hmm, yeah. I actually think you changed my mind on the recommendations. My new position is something like:
1. There should not be a higher burden on anti-recommendations than pro-recommendations.
2. Both pro- and anti-recommendations should come with caveats and conditionals whenever they make a difference to the target audience.Â
3. I'm now more convinced that the anti-recommendation of OP was appropriate.Â
4. I'd probably still phrase it differently than they did but my overall belief went from "this was unjustified" to "they should have used diffe...
I'm not very compelled by this response.
It seems to me you have two points on the content of this critique. The first point:
I think it's bad to criticize labs that do hits-based research approaches for their early output (I also think this applies to your critique of Redwood) because the entire point is that you don't find a lot until you hit.
I'm pretty confused here. How exactly do you propose that funding decisions get made? If some random person says they are pursuing a hits-based approach to research, should EA funders be obligated to fund them?
Presuma...
Wait, you think the reason we can't do brain improvement is because we can't change the weights of individual neurons?
That seems wrong to me. I think it's because we don't know how the neurons work.
Did you read the link to Cold Takes above? If so, where do you disagree with it?
(I agree that we'd be able to do even better if we knew how the neurons work.)
Similarly I'd be surprised if you thought that beings as intelligent as humans could recursively improve NNs. Cos currently we can't do that, right?
Humans can improve NNs? That's what AI capabilities resear...
I thought yes, but I'm a bit unhappy about that assumption (I forgot it was there). If you go by the intended spirit of the assumption (see the footnote) I'm probably on board, but it seems ripe for misinterpretation ("well if you had just deployed GPT-5 it really could have run an automated company, even though in practice we didn't do that because we were worried about safety and/or legal liability and/or we didn't know how to prompt it etc").
You could look at these older conversations. There's also Where I agree and disagree with Eliezer (see also my comment) though I suspect that won't be what you're looking for.
Mostly though I think you aren't going to get what you're looking for because it's a complicated question that doesn't have a simple answer.
(I think this regardless of whether you frame the question as "do we die?" or "do we live?", if you think the case for doom is straightforward I think you are mistaken. All the doom arguments I know of seem to me like they establish plausibility, ...
First off, let me say that I'm not accusing you specifically of "hype", except inasmuch as I'm saying that for any AI-risk-worrier who has ever argued for shorter timelines (a class which includes me), if you know nothing else about that person, there's a decent chance their claims are partly "hype". Let me also say that I don't believe you are deliberately benefiting yourself at others' expense.
That being said, accusations of "hype" usually mean an expectation that the claims are overstated due to bias. I don't really see why it matters if the bias is sur...
I don't yet understand why you believe that hardware scaling would come to grow at much higher rates than it has in the past.
If we assume innovations decline, then it is primarily because future AI and robots will be able to automate far more tasks than current AI and robots (and we will get them quickly, not slowly).
Imagine that currently technology A that automates area X gains capabilities at a rate of 5% per year, which ends up leading to a growth rate of 10% per year.
Imagine technology B that also aims to automate area X gains capabilities at a rate o...
I don't disagree with any of the above (which is why I emphasized that I don't think the scaling argument is sufficient to justify a growth explosion). I'm confused why you think the rate of growth of robots is at all relevant, when (general-purpose) robotics seem mostly like a research technology right now. It feels kind of like looking at the current rate of growth of fusion plants as a prediction of the rate of growth of fusion plants after the point where fusion is cheaper than other sources of energy.
(If you were talking about the rate of growth of machines in general I'd find that more relevant.)
I am confused by your argument against scaling.
My understanding of the scale-up argument is:
In some sense I agre...
I agree with premise 3. Where I disagree more comes down to the scope of premise 1.
This relates to the diverse class of contributors and bottlenecks to growth under Model 2. So even though it's true to say that humans are currently "the state-of-the-art at various tasks relevant to growth", it's also true to say that computers and robots are currently "the state-of-the-art at various tasks relevant to growth". Indeed, machines/external tools have been (part of) the state-of-the-art at some tasks for millennia (e.g. in harvesting), and computers and robots ...
“Hype” typically means Person X is promoting a product, that they benefit from the success of that product, and that they are probably exaggerating the impressiveness of that product in bad faith (or at least, with a self-serving bias).Â
All of this seems to apply to AI-risk-worriers?
Thanks for this, it's helpful. I do agree that declining growth rates is significant evidence for your view.
I disagree with your other arguments:
For one, an AI-driven explosion of this kind would most likely involve a corresponding explosion in hardware (e.g. for reasons gestured at here and here), and there are both theoretical and empirical reasons to doubt that we will see such an explosion.
I don't have a strong take on whether we'll see an explosion in hardware efficiency; it's plausible to me that there won't be much change there (and also plausible t...
I think it does [change the conclusion].
Upon rereading I realize I didn't state this explicitly, but my conclusion was the following:
If an agent has complete preferences, and it does not pursue dominated strategies, then it must be representable as maximizing expected utility.
Transitivity depending on completeness doesn't invalidate that conclusion.
Okay, it seems like we agree on the object-level facts, and what's left is a disagreement about whether people have been making a major error. I'm less interested in that disagreement so probably won't get into a detailed discussion, but I'll briefly outline my position here.
...The error is claiming thatÂ
- There exist  theorems which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy.
I haven't seen anyone point out that that claim
It is more like this stronger claim.
I might not use "inherently" here. A core safety question is whether an AI system is behaving well because it is aligned, or because it is pursuing convergent instrumental subgoals until it can takeover. The "natural" test is to run the AI until it has enough power to easily take over, at which point you observe whether it takes over, which is extremely long-horizon. But obviously this was never an option for safety anyway, and many of the proxies that we think about are more short horizon.