Researcher at the Center on Long-Term Risk. I (occasionally) write about altruism-relevant topics on my Substack. All opinions my own.

6

165

As nicely discussed in this comment, the key ideas of UDT and LDT seem to have been predated by, respectively, "resolute choice" and Spohn's variant of CDT. (It's not entirely clear to me how UDT or LDT are formally specified, though, and in my experience people seem to equivocate between different senses of "UDT".)

It seems to me that you need to weight the probability functions in your set according to some intuitive measure of your plausibility, according to your own priors.

The concern motivating the use of imprecise probabilities is that you don't always have a unique prior you're justified in using to compare the plausibility of these distributions. In some cases you'll find that any choice of unique prior, or unique higher-order distribution for aggregating priors, involves an arbitrary choice. (E.g., arbitrary weights assigned to conflicting intuitions about plausibility.)

It's becoming increasingly apparent to me how strong an objection to longtermist interventions this comment is. I'd be very keen to see more engagement with this model.

My own current take: I hold out some hope that our ability to forecast long-term effects, at least under some contingencies within our lifetimes, will be not-terrible enough. And I'm more sympathetic to straightforward EV maximization than you are. But the probability of systematically having a positive long-term impact by choosing any given A over B seems much smaller than longtermists act as if is the case — in particular, it does seem to be in Pascal's mugging territory.

My understanding is that:

- Spite (as a preference we might want to reduce in AIs) has just been relatively well-studied compared to other malevolent preferences. If this subfield of AI safety were more mature there might be less emphasis on spite in particular.
- (Less confident, haven't thought that much about this:) It seems conceptually more straightforward what sorts of training environments are conducive to spite, compared to fanaticism (or fussiness or little-to-lose, for that matter).

Thanks for asking — you can read more about these two sources of s-risk in Section 3.2 of our new intro to s-risks article. (We also discuss "near miss" there, but our current best guess is that such scenarios are significantly less likely than other s-risks of comparable scale.)

(This post was coauthored by Jesse Clifton — crossposting from LW doesn't seem to show this, unfortunately.)