AD

Anthony DiGiovanni

1140 karmaJoined

Bio

I'm Anthony DiGiovanni, a suffering-focused AI safety researcher at the Center on Long-Term Risk. I (occasionally) write about altruism-relevant topics on my Substack. All opinions my own.

Comments
160

It's becoming increasingly apparent to me how strong an objection to longtermist interventions this comment is. I'd be very keen to see more engagement with this model.

My own current take: I hold out some hope that our ability to forecast long-term effects, at least under some contingencies within our lifetimes, will be not-terrible enough. And I'm more sympathetic to straightforward EV maximization than you are. But the probability of systematically having a positive long-term impact by choosing any given A over B seems much smaller than longtermists act as if is the case — in particular, it does seem to be in Pascal's mugging territory.

My understanding is that:

  1. Spite (as a preference we might want to reduce in AIs) has just been relatively well-studied compared to other malevolent preferences. If this subfield of AI safety were more mature there might be less emphasis on spite in particular.
  2. (Less confident, haven't thought that much about this:) It seems conceptually more straightforward what sorts of training environments are conducive to spite, compared to fanaticism (or fussiness or little-to-lose, for that matter).

Thanks for asking — you can read more about these two sources of s-risk in Section 3.2 of our new intro to s-risks article. (We also discuss "near miss" there, but our current best guess is that such scenarios are significantly less likely than other s-risks of comparable scale.)

I've found this super useful over the past several months—thanks!

Given that you can just keep doing better and better essentially indefinitely, and that GPT is not anywhere near the upper limit, talking about the difficulty of the task isn't super meaningful.

I don't understand this claim. Why would the difficulty of the task not be super meaningful when training to performance that isn't near the upper limit?

In "Against neutrality...," he notes that he's not arguing for a moral duty to create happy people, and it's just good "others things equal." But, given that the moral question under opportunity costs is what practically matters, what are his thoughts on this view?: "Even if creating happy lives is good in some (say) aesthetic sense, relieving suffering has moral priority when you have to choose between these." E.g., does he have any sympathy for the intuition that, if you could either press a button that treats someone's migraine for a day or one that creates a virtual world with happy people, you should press the first one?

(I could try to shorten this if necessary, but worry about the message being lost from editorializing.)

I am (clearly) not Tobias, but I'd expect many people familiar with EA and LW would get something new out of Ch 2, 4, 5, and 7-11. Of these, seems like the latter half of 5, 9, and 11 would be especially novel if you're already familiar with the basics of s-risks along the lines of the intro resources that CRS and CLR have published. I think the content of 7 and 10 is sufficiently crucial that it's probably worth reading even if you've checked out those older resources, despite some overlap.

Anecdote: My grad school personal statement mentioned "Concrete Problems in AI Safety" and Superintelligence, though at a fairly vague level about the risks of distributional shift or the like. I got into some pretty respectable programs. I wouldn't take this as strong evidence, of course.

I'm fine with other phrasings and am also concerned about value lock-in and s-risks though I think these can be thought of as a class of x-risks

I'm not keen on classifying s-risks as x-risks because, for better or worse, most people really just seem to mean "extinction or permanent human disempowerment" when they talk about "x-risks." I worry that a motte-and-bailey can happen here, where (1) people include s-risks within x-risks when trying to get people on board with focusing on x-risks, but then (2) their further discussion of x-risks basically equates them with non-s-x-risks. The fact that the "dictionary definition" of x-risks would include s-risks doesn't solve this problem.

Load more