All of hb574's Comments + Replies

Against longtermism

Concern about the threat of human extinction is not longtermism (see Scott Alexander's well known forum post about this), which I think is the point that the OP is making.

Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

The rough shape of the argument is that I think a PASTA system requires roughly human-level general intelligence, and that implies some capabilities which HFDT as described in this post does not have the ability to learn. Using Karnofsky's original PASTA post, let's look at some of the requirements:

  1. Consume research papers.
  2. Identify research gaps and open problems.
  3. Choose an open problem to focus on.
  4. Generate hypotheses to test within a problem to potentially solve it.
  5. Generate experiment ideas to test the hypotheses.
  6. Judge how good the experiment ideas are and
... (read more)

I'm pretty unconvinced that your "suggests a significant number of fundamental breakthroughs remain to achieve PASTA" is strong enough to justify the odds being "approximately 0," especially when the evidence is mostly just expecting  tasks to stay hard as we scale (something which seems hard to predict, and easy to get wrong). Though it does seem that innovation in certain domains may lead to long episode lengths and inaccurate human evaluation, it also seems like innovation in certain fields (e.g., math) could easily not have this problem (i.e., in cases where verifying is much easier than solving).

If you're unhappy, consider leaving

People who are not perfectly satisfied with EA are more likely to have some disagreements with what they might percieve as EA consensus. Therefore, recommending that they leave directly decreases the diversity of ideas in EA and makes it more homogeneous. This seems likely to lead to a worse version of EA.

3shinybeetle25d
I don't think that the first premise is necessarily correct
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover

I'm an ML researcher, and I would give the probability of baseline HFDT leading to a PASTA set of capabilities as approximately 0, and my impression is that this is the experience of the majority of ML researchers.

Baseline HFDT seems to be the single most straightforward vision that could plausibly work to train transformative AI very soon. From informal conversations, I get the impression that many ML researchers would bet on something like this working in broadly the way I described in this post, and multiple major AI companies are actively trying to sca

... (read more)

Can you say more about why you think this? Both why you think there's 0 chance of HFDT leading to a system that can evaluate whether ideas are good and generate creative new ideas, and why you think this is what the majority of ML researchers think?

(I've literally never met a ML researcher with your view before to my knowledge, though I haven't exactly gone around asking everyone I know & my environment is of course selected against people with your view since I'm at OpenAI.)

1rogersbacon11mo
what don't you understand, seems pretty clear to me
How Could AI Governance Go Wrong?

Good post! I'm curious if you have any thoughts on the potential conflicts or contradictions between the "AI ethics" community, which focuses on narrow AI and harms from current AI systems (members of this community include Gebru and Whittaker) and the AI governance community that has sprung out of the AI safety/alignment community (e.g GovAI)? In my view, these two groups are quite opposed in priorities and ways of thinking about AI (take a look at Timnit Gebru's twitter feed for a very stark example) and trying to put them under one banner doesn't really... (read more)

7HaydnBelfield3mo
Hi, yes good question, and one that has been much discussed - here's three papers on the topic. I'm personally of the view that there shouldn't really be much conflict/contradictions - we're all pushing for the safe, beneficial and responsible development and deployment of AI, and there's lots of common ground. Bridging near- and long-term concerns about AI [https://www.repository.cam.ac.uk/handle/1810/293033] Bridging the Gap: the case for an Incompletely Theorized Agreement on AI policy [https://arxiv.org/abs/2101.06110] Reconciliation between Factions Focused on Near-Term and Long-Term Artificial Intelligence [https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2976444]
Transcripts of interviews with AI researchers

This is great work, I think it's really valuable to get a better sense of what AI researchers think of AI safety.

Often when I ask people in AI safety what they think AI researchers think of AGI and alignment arguments, they don't have a clear idea and just default to some variation on "I'm not sure they've thought about it much". Yet as these transcripts show, many AI researchers are well aware of AI risk arguments (in my anecdotal experience, many have read at least part of Superintelligence ) and have more nuanced views. So I'm worried that AI safety is ... (read more)

6Vael Gates3mo
Indeed! I've actually found that in most of my interviews people haven't thought about the 50+ year future much or heard of AI alignment, given that my large sample is researchers who had papers at NeurIPS or ICML. (The five researchers who were individually selected here had thought about AI alignment uncommonly much, which didn't particularly surprise me given how they were selected.) Yes. With the note that the arguments brought forth are generally less carefully thought-through than the ones shown in the individually-selected-population, due to the larger population. But you can get a sense for some of the types of arguments in the six transcripts from NeurIPS / ICML researchers, though I wouldn't say it's fully representative.
The AI Messiah

Just my anecdotal experience, but when I ask a lot of EAs working in or interested in AGI risk why they think it's a hugely important x-risk, one of the first arguments that comes to people's minds is some variation on "a lot of smart people [working on AGI risk] are very worried about it". My model of many people in EA interested in AI safety is that they use this heuristic as a dominant factor in their reasoning — which is perfectly understandable! After all, formulating a view of the magnitude of risk from transformative AI without relying on any such heuristics is extremely hard. But I think this post is a valuable reminder that it's not particularly good epistemics for lots of people to think like this.

2Linch3mo
Can I ask roughly what work they're doing? Again I think it makes more sense if you're earning-to-give or doing engineering work, and less if you're doing conceptual or strategic research. It also makes sense if you're interested in it as an avenue to learn more.

The title of this post is a general claim about the long-term future, and yet nowhere in your post do you mention any x-risks other than AI. Why should we not expect other x-risks to outweigh these AGI considerations, since they may not fit into this framework of extinction, ok outcome, utopian outcome? I am not necessarily convinced that pulling the utopia handle on actions related to AGI (like the four you suggest) have a greater effect on P(utopia) than some set of non-AGI-related interventions.

Replicating and extending the grabby aliens model

Looks like great work! Do you plan to publish this in a similar venue to previous papers on this topic, such as in an astrophysics journal? I would be very happy to see more EA work published in mainstream academic venues.

4Tristan Cook4mo
Thanks! I've considered it but have not decided whether I will. I'm unsure whether the decision relevant parts (which I see as most important) or weirder stuff (like simulations) would need to be cut.
FLI launches Worldbuilding Contest with $100,000 in prizes

Isn't "Technology is advancing rapidly and AI is transforming the world sector by sector" perfectly consistent with a singularity? Perhaps it would be a rather large understatement, but still basically true.

2Zach Stein-Perlman7mo
Not really (but the quote is consistent with no singularity; see Rohin's comment). I expect technological progress will be very slow soon after a singularity because science is essentially solved and almost all technology is discovered during or immediately after the singularity. Additionally, the suggestions that there's 'international power equilibrium' and generally that the world is recognizable--e.g., with prosaic global political power balance, and that AI merely 'solves problems' and 'reshapes the economy'--rather than totally transformed is not what I expect years after singularity.
A case for the effectiveness of protest

There's a lot of good work here and I don't have time to analyse it in detail, but I had a look at some of your estimates, and I think they depend a bit too heavily on subjective guesses about the counterfactual impact of XR to be all that useful. I can imagine that if you vary the parameter for how much XR might have brought forward net zero or the chance that it directly caused net zero pledges to be taken, then you end up with very large bounds on your ultimate effectiveness numbers. Personally, I don't think it's all that reasonable to suggest that, fo... (read more)

6James Ozden8mo
Hey Herbie - good questions/points so thank you for this response. A few points: 1. On "they depend a bit too heavily on subjective guesses to be useful" I think this massively depends on what you think "useful" is! Do I think this evidence is rigorous enough to allocate tens of millions towards nonviolent protest groups? Definitely not, so it's not useful in that regard. But do I think this is an update on the very little quantitative information we had previously? Yes, definitely useful in that case! At least for me, before I did this project, my uncertainty around the impact of XR ranged from negative overall to roughly the cost-effectiveness of CATF. So in many ways, this project definitely updated my beliefs on the possible upper bounds of cost-effectiveness and that XR was probably net good overall. Also I think it was useful because it showed that even in my pessimistic scenarios, the cost-effectiveness of XR was still not that far off CATF, which I think is the one of the strongest reasons to look into it further. In essence, the purpose of me doing this preliminary research was to make a case that we should do further research, as this still isn't very conclusive (as you've pointed out). So again, I think it was definitely useful in that regard, as it now is happening. The next steps for this research would definitely be to try reduce the reliance on subjective values for these estimates and reduce the confidence intervals, so more inline with what you're talking about. But overall, I think this initial research has narrowed the range of uncertainty / at least put some numbers on the potential cost-effectiveness of nonviolent protest, whereas we didn't really have that at all before, both inside the EA community and out. The next steps would be to keep narrowing down the large bounds to make it more and more actionable for funders/people generally, so maybe that will be more useful in that regard if it succeeds! 2. On the net-zero pledges I agree, I