So I'm not really seeing anything "bad" here.
I didn't say your proposal was "bad", I said it wasn't "conservative".
My point is just that, if GHD were to reorient around "reliable global capacity growth", it would look very different, to the point where I think your proposal is better described as "stop GHD work, and instead do reliable global capacity growth work", rather than the current framing of "let's reconceptualize the existing bucket of work".
I'll suggest a reconceptualization that may seem radical in theory but is conservative in practice.
It doesn't seem conservative in practice? Like Vasco, I'd be surprised if aiming for reliable global capacity growth would look like the current GHD portfolio. For example:
I also think it misses the worldview bucket that's the main reason why many people fund global health and (some aspects of) development: intrinsic value attached to saving [human] lives. Potential positive flowthrough effects are a bonus on top of that, in most cases.
From an EA-ish hedonic utilitarianism perspective this dates right back to Singer's essay about saving a drowning child. Taking that thought experiment in a different direction, I don't think many people - EA or otherwise - would conclude that the decision on whether to save the child or not s...
Research Scientist and Research Engineer roles in AI Safety and Alignment at Google DeepMind.
Location: Hybrid (3 days/week in the office) in San Francisco / Mountain View / London.
Application deadline: We don't have a final deadline yet, but will keep the roles open for at least another two weeks (i.e. until March 1, 2024), and likely longer.
For further details, see the roles linked above. You may also find my FAQ useful.
(Fyi, I probably won't engage more here, due to not wanting to spend too much time on this)
Jonas's comment is a high level assessment that is only useful insofar as you trust his judgment.
This is true, but I trust basically any random commenter a non-zero amount (unless their comment itself gives me reasons not to trust them). I agree you can get more trust if you know the person better. But even the amount of trust for "literally a random person I've never heard of" would be enough for the evidence to matter to me.
...I'm only saying that I think large update
SBF was an EA leader in good standing for many years and had many highly placed friends. It's pretty notable to me that there weren't many comments like Jonas's for SBF, while there are for Owen.
I think these cases are too different for that comparison to hold.
One big difference is that SBF committed fraud, not sexual harassment. There's a long history of people minimizing sexual harassment, especially when it's as ambiguous. There's also a long history of ignoring fraud when you're benefiting from it, but by the time anyone had a chance to com...
The evidence Jonas provides is equally consistent with “Owen has a flaw he has healed” and “Owen is a skilled manipulator who charms men, and harasses women”.
Surely there are a lot of other hypotheses as well, and Jonas's evidence is relevant to updating on those?
More broadly, I don't think there's any obvious systemic error going on here. Someone who knows the person reasonably well, giving a model for what the causes of the behavior were, that makes predictions about future instances, clearly seems like evidence one should take into account.
(I do agree t...
Surely there are a lot of other hypotheses as well, and Jonas's evidence is relevant to updating on those?
There are of course infinite hypotheses. But I don't think Jonas's statement adds much to my estimates of how much harm Owen is likely to do in the future, and expect the same should be true for most people reading this.
To be clear I'm not saying I estimate more harm is likely- taking himself off the market seems likely to work, and this has been public enough I expect it to be easy for future victims to complain if something does happen. I'm onl...
Yeah, I don't think it's accurate to say that I see assistance games as mostly irrelevant to modern deep learning, and I especially don't think that it makes sense to cite my review of Human Compatible to support that claim.
The one quote that Daniel mentions about shifting the entire way we do AI is a paraphrase of something Stuart says, and is responding to the paradigm of writing down fixed, programmatic reward functions. And in fact, we have now changed that dramatically through the use of RLHF, for which a lot of early work was done at CHAI, so I think...
Fyi, the list you linked doesn't contain most of what I would consider the "small" orgs in AI, e.g. off the top of my head I'd name ARC, Redwood Research, Conjecture, Ought, FAR AI, Aligned AI, Apart, Apollo, Epoch, Center for AI Safety, Bluedot, Ashgro, AI Safety Support and Orthogonal. (Some of these aren't even that small.) Those are the ones I'd be thinking about if I were to talk about merging orgs.
Maybe the non-AI parts of that list are more comprehensive, but my guess is that it's just missing most of the tiny orgs that OP is talking about (e.g. OP'...
:) I'm glad we got to agreement!
(Or at least significantly closer, I'm sure there are still some minor differences.)
On hits-based research: I certainly agree there are other factors to consider in making a funding decision. I'm just saying that you should talk about those directly instead of criticizing the OP for looking at whether their research was good or not.
(In your response to OP you talk about a positive case for the work on simulators, SVD, and sparse coding -- that's the sort of thing that I would want to see, so I'm glad to see that discussion starting.)
On VCs: Your position seems reasonable to me (though so does the OP's position).
On recommendations: Fwiw I ...
Hmm, yeah. I actually think you changed my mind on the recommendations. My new position is something like:
1. There should not be a higher burden on anti-recommendations than pro-recommendations.
2. Both pro- and anti-recommendations should come with caveats and conditionals whenever they make a difference to the target audience.
3. I'm now more convinced that the anti-recommendation of OP was appropriate.
4. I'd probably still phrase it differently than they did but my overall belief went from "this was unjustified" to "they should have used diffe...
I'm not very compelled by this response.
It seems to me you have two points on the content of this critique. The first point:
I think it's bad to criticize labs that do hits-based research approaches for their early output (I also think this applies to your critique of Redwood) because the entire point is that you don't find a lot until you hit.
I'm pretty confused here. How exactly do you propose that funding decisions get made? If some random person says they are pursuing a hits-based approach to research, should EA funders be obligated to fund them?
Presuma...
Wait, you think the reason we can't do brain improvement is because we can't change the weights of individual neurons?
That seems wrong to me. I think it's because we don't know how the neurons work.
Did you read the link to Cold Takes above? If so, where do you disagree with it?
(I agree that we'd be able to do even better if we knew how the neurons work.)
Similarly I'd be surprised if you thought that beings as intelligent as humans could recursively improve NNs. Cos currently we can't do that, right?
Humans can improve NNs? That's what AI capabilities resear...
I think it's within the power of beings equally as intelligent as us (similarly as mentioned above I think recursive improvement in humans would accelerate if we had similar abilities).
I thought yes, but I'm a bit unhappy about that assumption (I forgot it was there). If you go by the intended spirit of the assumption (see the footnote) I'm probably on board, but it seems ripe for misinterpretation ("well if you had just deployed GPT-5 it really could have run an automated company, even though in practice we didn't do that because we were worried about safety and/or legal liability and/or we didn't know how to prompt it etc").
You could look at these older conversations. There's also Where I agree and disagree with Eliezer (see also my comment) though I suspect that won't be what you're looking for.
Mostly though I think you aren't going to get what you're looking for because it's a complicated question that doesn't have a simple answer.
(I think this regardless of whether you frame the question as "do we die?" or "do we live?", if you think the case for doom is straightforward I think you are mistaken. All the doom arguments I know of seem to me like they establish plausibility, ...
First off, let me say that I'm not accusing you specifically of "hype", except inasmuch as I'm saying that for any AI-risk-worrier who has ever argued for shorter timelines (a class which includes me), if you know nothing else about that person, there's a decent chance their claims are partly "hype". Let me also say that I don't believe you are deliberately benefiting yourself at others' expense.
That being said, accusations of "hype" usually mean an expectation that the claims are overstated due to bias. I don't really see why it matters if the bias is sur...
I don't yet understand why you believe that hardware scaling would come to grow at much higher rates than it has in the past.
If we assume innovations decline, then it is primarily because future AI and robots will be able to automate far more tasks than current AI and robots (and we will get them quickly, not slowly).
Imagine that currently technology A that automates area X gains capabilities at a rate of 5% per year, which ends up leading to a growth rate of 10% per year.
Imagine technology B that also aims to automate area X gains capabilities at a rate o...
I don't disagree with any of the above (which is why I emphasized that I don't think the scaling argument is sufficient to justify a growth explosion). I'm confused why you think the rate of growth of robots is at all relevant, when (general-purpose) robotics seem mostly like a research technology right now. It feels kind of like looking at the current rate of growth of fusion plants as a prediction of the rate of growth of fusion plants after the point where fusion is cheaper than other sources of energy.
(If you were talking about the rate of growth of machines in general I'd find that more relevant.)
I am confused by your argument against scaling.
My understanding of the scale-up argument is:
In some sense I agre...
I agree with premise 3. Where I disagree more comes down to the scope of premise 1.
This relates to the diverse class of contributors and bottlenecks to growth under Model 2. So even though it's true to say that humans are currently "the state-of-the-art at various tasks relevant to growth", it's also true to say that computers and robots are currently "the state-of-the-art at various tasks relevant to growth". Indeed, machines/external tools have been (part of) the state-of-the-art at some tasks for millennia (e.g. in harvesting), and computers and robots ...
“Hype” typically means Person X is promoting a product, that they benefit from the success of that product, and that they are probably exaggerating the impressiveness of that product in bad faith (or at least, with a self-serving bias).
All of this seems to apply to AI-risk-worriers?
Thanks for this, it's helpful. I do agree that declining growth rates is significant evidence for your view.
I disagree with your other arguments:
For one, an AI-driven explosion of this kind would most likely involve a corresponding explosion in hardware (e.g. for reasons gestured at here and here), and there are both theoretical and empirical reasons to doubt that we will see such an explosion.
I don't have a strong take on whether we'll see an explosion in hardware efficiency; it's plausible to me that there won't be much change there (and also plausible t...
I think it does [change the conclusion].
Upon rereading I realize I didn't state this explicitly, but my conclusion was the following:
If an agent has complete preferences, and it does not pursue dominated strategies, then it must be representable as maximizing expected utility.
Transitivity depending on completeness doesn't invalidate that conclusion.
Okay, it seems like we agree on the object-level facts, and what's left is a disagreement about whether people have been making a major error. I'm less interested in that disagreement so probably won't get into a detailed discussion, but I'll briefly outline my position here.
...The error is claiming that
- There exist theorems which state that, unless an agent can be represented as maximizing expected utility, that agent is liable to pursue strategies that are dominated by some other available strategy.
I haven't seen anyone point out that that claim
Thanks, I understand better what you're trying to argue.
The part I hadn't understood was that, according to your definition, a "coherence theorem" has to (a) only rely on antecedents of the form "no dominated strategies" and (b) conclude that the agent is representable by a utility function. I agree that on this definition there are no coherence theorems. I still think it's not a great pedagogical or rhetorical move, because the definition is pretty weird.
I still disagree with your claim that people haven't made this critique before.
From your discussion:
...[T
Your post argues for a strong conclusion:
To spot the error in these arguments, we only have to look up what cited ‘coherence theorems’ actually say. And yet the error seems to have gone uncorrected for more than a decade.
[...]
There are no coherence theorems. Authors in the AI safety community should stop suggesting that there are.
There are money-pump arguments, but the conclusions of these arguments are not theorems. The arguments depend on substantive and doubtful assumptions.
As I understand it, you propose two main arguments for the conclusion:
Theorems are typically of the form "Suppose X, then Y"; what is X if not an assumption?
X is an antecedent.
Consider an example. Imagine I claim:
In making this claim, I am not assuming that James is a bachelor. My claim is true whether or not James is a bachelor.
I might temporarily assume that James is a bachelor, and then use that assumption to prove that James is unmarried. But when I conclude ‘Suppose James is a bachelor. Then James is unmarried’, I discharge that initial assumption. My conclusion no lo...
If we bracket the timelines part and just ask about p(doom), I think https://www.lesswrong.com/posts/Ke2ogqSEhL2KCJCNx/security-mindset-lessons-from-20-years-of-software-security and https://intelligence.org/2017/11/25/security-mindset-ordinary-paranoia/ makes it quite easy to reach extremely dire forecasts about AGI. Getting extremely novel software right on the first try is just that hard.
Surely not. Neither of those make any arguments about AI, just about software generally. If you literally think those two are sufficient arguments for concluding "AI ki...
I think this is generally false, but might benefit from some examples or more specifics.
(Referring to the OP, not the comment)
Unless they were edited in after these comments were written
For the record, these were not edited in after seeing the replies. (Possibly I edited them in a few minutes after writing the comment -- I do that pretty frequently -- but if so it was before any of the replies were written, and very likely before any of the repliers had seen my comment.)
At times like these, I don't really know why I bother to engage on the EA Forum, given that people seem to be incapable of engaging with the thing I wrote instead of some totally different thing in their head.
I'll just pop back in here briefly to say that (1) I have learned a lot from your writing over the years, (2) I have to say I still cannot see how I misinterpreted your comment, and (3) I genuinely appreciate your engagement with the post, even if I think your summary misses the contribution in a fundamentally important way (as I tried to elaborate elsewhere in the thread).
The argument here emphatically cannot be merely summarized as "AGI soon [is] a very contrarian position [and market prices are another indication of this]".
Can you describe in concrete detail a possible world in which:
It seems to me that if you were in such a situation, all of the non-contrarian hedge fund managers, bankers, billionaires would do the opposite of all of the trades that you've ...
This post's thesis is that markets don't expect AGI in the next 30 years. I'll make a stronger claim: most people don't expect AGI in the next 30 years; it's a contrarian position. Anyone expecting AGI in that time is disagreeing with a very large swath of humanity.
(It's a stronger claim because "most people don't expect AGI" implies "markets don't expect AGI", but the reverse is not true. (Not literally so -- you can construct scenarios like "only investors expect AGI while others don't" where most people don't expect AGI but the market does expect AGI --...
It's a stronger claim because "most people don't expect AGI" implies "markets don't expect AGI"
I'm not sure that's true. Markets often price things that only a minority of people know or care about. See the lithium example in the original post. That was a case where "most people didn't know lithium was used in the H-bomb" didn't imply that "markets didn't know lithium was used in the H-bomb"
My view on this is rather that there seem to be several key technologies and measures of progress that have very limited room for further growth, and the ~zero-to-one growth that occurred along many of these key dimensions seems to have been low-hanging fruit that coincided with the high growth rates that we observed around the mid-1900s. And I think this counts as modest evidence against a future growth explosion.
Hmm, it seems to me like these observations are all predicted by the model I'm advocating, so I don't see why they're evidence against that mode...
You're trying to argue for "there are no / very few important technologies with massive room for growth" by giving examples of specific things without massive room for growth.
In general arguing for "there is no X that satisfies Y" by giving examples of individual Xs that don't satisfy Y is going to be pretty rough and not very persuasive to me, unless there's some reason that can be abstracted out of the individual examples that is likely to apply to all Xs, which I don't see in this case. I don't care much whether the examples are technologies or measures...
I don't disagree with anything you've written here, but I'm not seeing why the limits they impose are anywhere close to where we are today.
I think most of the arguments I present in the section on why I consider Model 2 most plausible are about declining growth along various metrics.
Yes, sorry, I shouldn't have said "most".
especially those presented in the section “Many key technologies only have modest room for further growth”
Yeah, I mostly don't buy the argument (sorry for not noting that earlier). It's not the case that there are N technologies and progress consists solely of improving those technologies; progress usually happens by developing new technologies. So I don't see the fact that...
As I understand it your argument is "Even if AI could lead to explosive growth, we'll choose not to do it because we don't yet know what we want". This seems pretty wild, does literally no one want to make tons of money in this scenario?
What do you think about the perspective of "Model 2, but there's still explosive growth"? In particular, depending on what exactly you mean by cognitive abilities, I think it's pretty plausible to believe (1) cognitive abilities are a rather modest part of ability to achieve goals, (2)
individual human cognitive abilities are a significant bottleneck among many others to economic and technological growth, (3) the most growth-relevant human abilities will be surpassed by machines quite continuously (not over millennia, but there isn't any one "big disc...
I wrote earlier that I might write a more elaborate comment, which I'll attempt now. The following are some comments on the pieces that you linked to.
I disagree with this series in a number of places. For example, in the post "This Can't Go On", it says the following in the context of an airplane metaphor for our condition:
We're going much faster than normal, and there isn't enough runway to do this much longer ... and we're accelerating.
As argued above, in terms of economic growth rates, we're in fact not accelerating, ...
Fwiw I'd also say that most of "the most engaged EAs" would not feel betrayed or lied to (for the same reasons), though I would be more uncertain about that. Mostly I'm predicting that there's pretty strong selection bias in the people you're thinking of and you'd have to really precisely pin them down (e.g. maybe something like "rationalist-adjacent highly engaged EAs who have spent a long time thinking meta-honesty and glomarization") before it would become true that a majority of them would feel betrayed or lied to.
Like, let's look ahead a few months. Some lower-level FTX employee is accused of having committed some minor fraud with good ethical justification that actually looks reasonable according to RP leadership, so they make a statement coming out in defense of that person.
Do you not expect this to create strong feelings of betrayal in previous readers of this post, and a strong feeling of having been lied to?
I broadly agree with your comments otherwise, but in fact in this hypothetical I expect most readers of this post would not feel betrayed or lied to....
Yeah, sorry, I think you are right that as phrased this is incorrect. I think my phrasing implies I am talking about the average or median reader, who I don't expect to react in this way.
Across EA, I do expect reactions to be pretty split. I do expect many of the most engaged EAs to have taken statements like this pretty literally and to feel quite betrayed (while I also think that in-general the vast majority of people will have interpreted the statements as being more about mood-affiliation and to have not really been intended to convey information).&nbs...
But even if we could be confident that entertainment would hypothetically outweigh sex crimes on pure utilitarian grounds, in the real world with real politics and EA critics, I do not think this position would be tenable.
Isn't this basically society's revealed position on, say, cameras? People can and do use cameras for sex crimes (e.g. voyeurism) but we don't regulate cameras in order to reduce sex crimes.
I agree that PR-wise it's not a great look to say that benefits outweigh risks when the risks are sex crimes but that's because PR diverges wildly from...
A way to separate between goal misgeneralization and capabilities misgeneralization would be exciting if work on goal misgeneralization could improve alignment without capabilities externalities.
However, this distinction might be eroded in the future.
My favorite distinction between alignment vs capabilities, which mostly doesn't work now but should work for more powerful future systems, is to ask "did the model 'know' that the actions it takes are ones that the designers would not want?" If yes, then it's misalignment.
(This is briefly discussed in Section 5.2 of the paper.)
As far as I can tell, we don't know of any principles that satisfy both (1) they guide our actions in all or at least most situations and (2) when taken seriously, they don't lead to crazy town. So our options seem to be (a) don't use principles to guide actions, (b) go to crazy town, or (c) use principles to guide actions but be willing to abandon them when the implications get too crazy.
I wish this post had less of a focus on utilitarianism and more on whether we should be doing (a), (b) or (c).
(I am probably not going to respond to comments about how specific principles satisfy (1) and (2) unless it actually seems plausible to me that they might satisfy (1) and (2).)
Thanks for your comment.
It's not clear to me that (a), (b), and (c) are the only options - or rather, there are a bunch of different variants (c'), (c''), (c'''). Sure, you can say "use principles to guide action until they get too crazy', but you can also say 'use a multiplicity of principles to guide action until they conflict', or 'use a single principle to guide action until you run into cases where it is difficult to judge how the principle applies', or so on. There are lots of different rules of thumb to tell you when and how principles run out, none...
You can either interpret low karma as a sign that the karma system is broken or that the summaries aren't sufficiently good. In hindsight I think you're right and I lean more towards the former --even though people tell me they like my newsletter, it doesn't actually get that much karma.
I thought you thought that karma was a decent measure since you suggested
Putting the summary up as a Forum post and seeing if it gets a certain number of karma
as a way to evaluate how good a summary is.
Idk, in my particular case I'd say writing summaries was a major reason that I now have prestige / access to resources.
I think it's probably just hard to write good summaries; many of the summaries posted here don't get very much karma.
I'm surprised that "write summaries" isn't one of the proposed concrete solutions. One person can do a lot.
Yeah, I don't think it's clearly unreasonable (though it's not my intuition).
I agree that suicide rates are not particularly strong evidence one way or the other.
I broadly agree that "what does a life barely worth living look like" matters a lot, and you could imagine setting it to be high enough that the repugnant conclusion doesn't look repugnant.
That being said, if you set it too high, there are other counterintuitive conclusions. For example, if you set it higher than people alive today (as it sounds like you're doing), then you are saying that people alive today have negative terminal value, and (if we ignore instrumental value) it would be better if they didn't exist.
So, did I or didn't I come across as unfriendly/hostile?
You didn't to me, but also (a) I know you in person and (b) I'm generally pretty happy to be in forceful arguments and don't interpret them as unfriendly / hostile, while other people plausibly would (see also combat culture). So really I think I'm the wrong person to ask.
...So, given that I wanted to do both 1 and 2, would you think it would have been fine if I had just made them as separate comments, instead of mentioning 1 in passing in the thread on 2? Or do you think I really should have picked one
Did I come across as unfriendly and hostile? I am sorry if so, that was not my intent.
No, that's not what I meant. I'm saying that the conversational moves you're making are not ones that promote collaborative truth-seeking.
Any claim of actual importance usually has a giant tree of arguments that back it up. Any two people are going to disagree on many different nodes within this tree (just because there are so many nodes). In addition, it takes a fair amount of effort just to understand and get to the same page on any one given node.
So, if you want to do ...
Lots of thoughts on this post:
Inside Views are Overrated [...]
The obvious reason to form inside views is to form truer beliefs
No? The reason to form inside views is that it enables better research, and I'm surprised this mostly doesn't feature in your post. Quoting past-you:
...
- Research quality - Doing good research involves having good intuitions and research taste, sometimes called an inside view, about why the research matters and what’s really going on. This conceptual framework guides the many small decisions and trade-offs you make o
Oh I see, sorry for misinterpreting you.