K

Kit

1378 karmaJoined Jan 2016

Comments
91

The commonsense meaning of 'I predicted X' is that I used some other information to assess that X was likely. 'I saw the announcement of X before it was published' is not that. I agree that it wasn't literally false. It just gave a false impression. Hence 'pretty clickbaity/misleading'.

'Predicted' in the title is pretty clickbaity/misleading given that the market was created and driven by insider traders. 'Knew About' or 'Leaked Information About' seem much more accurate.

Otherwise, I found this very interesting. I hadn't heard of this market before, and appreciate the analysis of what seems like it might be a very important case study, both for how to handle the leaking of embargoed or otherwise sensitive information and for what to do about insider trading.

I like that you can interact with this. It makes understanding models so much easier.

Playing with the calculator, I see that the result is driven to a surprising degree by the likelihood that "Compute needed by AGI, relative to a human brain (1e20-1e21 FLOPS)" is <1/1,000x (i.e. the bottom two options).[1]

I think this shows that your conclusion is driven substantially by your choice to hardcode "1e20-1e21 FLOPS" specifically, and then to treat this figure as a reasonable proxy for what computation an AGI would need. (That is, you suggest ~~1x as the midpoint for "Compute needed by AGI, relative to... 1e20-1e21 FLOPS").

I think it's also a bit of an issue to call the variable "relative to a human brain (1e20-1e21 FLOPS)". Most users will read it as "relative to a human brain" while it's really "relative to 1e20-1e21 FLOPS", which is quite a specific take on what a human brain is achieving.

I value the fact that you argue for choosing this figure here. However, it seems like you're hardcoding in confidence that isn't warranted. Even from your own perspective, I'd guess that including your uncertainty over this figure would bump up the probability by a factor of 2-3, while it looks like other commenters have pointed out that programs seem to use much less computation than we'd predict with a similar methodology applied to tasks computers already do.

  1. ^

    This is assuming a distribution on computation centred on ballpark ~100x as efficient in the future (just naively based on recent trends). If putting all weight on ~100x, nothing above 1/1,000x relative compute requirement matters. If putting some weight on ~1,000x, nothing above 1/100x relative compute requirement matters.

Did you intend to refer to page 83 rather than 82?

Kit
2y23
0
0

It seems extremely clear that working with the existing field is necessary to have any idea what to do about nuclear risk. That said, being a field specialist seems like a surprisingly small factor in forecasting accuracy, so I’m surprised by that being the focus of criticism.

I was interested in the criticism (32:02), so I transcribed it here:

Jeffrey Lewis: By the way, we have a second problem that arises, which I think Wizards really helps explain: this is why our field can’t get any money.

Aaron Stein: That’s true.

Jeffrey Lewis: Because it’s extremely hard to explain to people who are not already deep in this field how these deterrence concepts work, because they don’t get it. Like if you look at the work that the effective altruism community does on nuclear risk, it’s as misguided as SAC’s original, you know, approach to nuclear weapons, and you would need an entire RAND-sized outreach effort. And there are some people who’ve tried to do this. Peter Scoblic, who is fundamentally a member of that community, wrote a really nice piece responding to some of the like not great effective altruism assessments of nuclear risk in Ukraine. So I don’t want to, you know, criticise the entire community, but… I experience this at a cocktail party. Once I start talking to someone about nuclear weapons and deterrence… if they don’t do this stuff full-time, the popular ideas they have about this are… (a) they might be super bored, but if they are willing to listen, the popular ideas they have about it are so misguided, that it becomes impossible to make enough progress in a reasonable time. And that’s death when you’re asking someone to make you a big cheque. That’s much harder than ‘hi, I want to buy some mosquito nets to prevent malaria deaths’. That’s really straightforward. This… this is complex.

It’s a shame that this doesn’t identify any specific errors, although that is consistent with Lewis’ view that the errors can’t be explained in minutes, perhaps even in years.

Speaking for myself, I agree with Lewis that popular ideas about nuclear weapons can be wildly, bizarrely wrong. That said, I’m surprised he highlights effective altruism as a community he’s pessimistic about being able to teach. The normal ‘cocktail party’ level of discourse includes alluring claims like ‘XYZ policy is totally obvious; we just have to implement it’, and the effective altruism people I’ve spoken to on nuclear issues are generally way less credulous than this, and hence more interested in understanding how things actually work.

During the calls I've offered to people considering applying to work as a grantmaker, I've sent this post to about 10 people already. Thanks for writing it!

I agree that this is the most common misconception about grantmaking. To be clear, (as we've discussed) I think there are some ways to make a difference in grantmaking which are exceptions to the general rule explained here, but think this approach is the right one for most people.

This was the single most valuable piece on the Forum to me personally. It provides the only end-to-end model of risks from nuclear winter that I've seen and gave me an understanding of key mechanisms of risks from nuclear weapons. I endorse it as the best starting point I know of for thinking seriously about such mechanisms. I wrote what impressed me most here and my main criticism of the original model here (taken into account in the current version).

This piece is part of a series. I found most articles in the series highly informative, but this particular piece did the most excellent job of improving my understanding of risks from nuclear weapons.

 

Details that I didn’t cover elsewhere, based on recommended topics for reviewers:

How did this post affect you, your thinking, and your actions?

  • It was a key part of what caused me to believe that civilisation collapsing everywhere solely due to nuclear weapons is extremely unlikely without a large increase in the number of such weapons. (The model in the post is consistent with meaningful existential risk from nuclear weapons in other ways.)
  • This has various implications for prioritisation between existential risks and prioritisation within the nuclear weapons space.

Does it make accurate claims? Does it carve reality at the joints? How do you know?

  • I spent about 2 days going through the 5 posts the author published around that time, comparing them to much rougher models I had made and looking into various details. I was very impressed.
  • The work that went into the post did the heavy lifting and pointed a way to a better understanding of nuclear risk. The model in the original version of the post was exceptionally concrete and with a low error rate, such that reviewers were able to engage with it to identify the key errors in the original version of the post.

Thanks for this.

Without having the data, it seems the controversy graph could be driven substantially by posts which get exactly zero downvotes.

Almost all posts get at least one vote (magnitude >= 1), and balance>=0, so magnitude^balance >=1. Since the controversy graph goes below 1, I assume you are including the handling which sets controversy to zero if there are zero downvotes, per the Reddit code you linked to.

e.g. if a post has 50 upvotes:
0 downvotes --> controversy 0 (not 1.00)
1 downvote --> controversy 1.08
2 downvotes --> controversy 1.17
10 downvotes --> controversy 2.27
so a lot of the action is in whether a post gets 0 downvotes or at least 1, and we know a lot of posts get 0 downvotes because the graph is often below 1.

If this is a major contributor, the spikes would look different if you run the same calculation without the handling (or, equivalently, with the override being to 1 instead of 0). This discontinuity also makes me suspect that Reddit uses this calculation for ordering only, not as a cardinal measure -- or that zero downvotes is an edge case on Reddit!

Load more