Topic Contributions


On Deference and Yudkowsky's AI Risk Estimates

Like, suppose you think that Eliezer's credences on his biggest claims are literally 2x higher than they should be, even for claims where he's 90% confident. This is a huge hit in terms of Bayes points; if that's how you determine deference, and you believe he's 2x off, then plausibly you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved - this should very rarely move you from a yes to no, or vice versa.


Such differences are crucial for many of the most important grant areas IME, because they are areas where you are trading off multiple high-stakes concerns. E.g. in nuclear policy all the strategies on offer  have arguments that they might lead to nuclear war or worse war. On AI alignment there are multiple such tradeoffs and people embracing strategies to push the same variable in opposite directions with high stakes on both sides.

Expected ethical value of a career in AI safety

Thanks for this exercise, it's great to do this kind of thinking explicitly and get other eyes on it.

One issue that jumps out at me to adjust: the calculation of researcher impact doesn't seem to be marginal impact. You give a 10% chance of the alignment research community averting disaster conditional on misalignment by default in the scenarios where safety work is plausibly important, then divide that by the expected number of people in the field to get a per-researcher impact. But in expectation you should expect marginal impact to be less than average impact: the chance the alignment community averts disaster with 500 people seems like a lot more than half the chance it would do so with 1000 people.

I would distribute my credence in alignment research making the difference over a number of doublings of the cumulative quality-adjusted efforts, e.g. say that you get an x% reduction of risk per doubling over some range.

Although in that framework if you would likely have doom with zero effort, that means we have more probability of making the difference to distribute across the effort levels above zero. The results could be pretty similar but a bit smaller than yours above if we thought that the marginal doubling  of cumulative effort was worth a 5-10% relative risk reduction.

St. Petersburg Demon – a thought experiment that makes me doubt Longtermism

This case (with our own universe, not a new one) appears in a Tyler Cowen interview of Sam Bankman-Fried:

COWEN: Should a Benthamite be risk-neutral with regard to social welfare?

BANKMAN-FRIED: Yes, that I feel very strongly about.

COWEN: Okay, but let’s say there’s a game: 51 percent, you double the Earth out somewhere else; 49 percent, it all disappears. Would you play that game? And would you keep on playing that, double or nothing?

BANKMAN-FRIED: With one caveat. Let me give the caveat first, just to be a party pooper, which is, I’m assuming these are noninteracting universes. Is that right? Because to the extent they’re in the same universe, then maybe duplicating doesn’t actually double the value because maybe they would have colonized the other one anyway, eventually.

COWEN: But holding all that constant, you’re actually getting two Earths, but you’re risking a 49 percent chance of it all disappearing.

BANKMAN-FRIED: Again, I feel compelled to say caveats here, like, “How do you really know that’s what’s happening?” Blah, blah, blah, whatever. But that aside, take the pure hypothetical.

COWEN: Then you keep on playing the game. So, what’s the chance we’re left with anything? Don’t I just St. Petersburg paradox you into nonexistence?

BANKMAN-FRIED: Well, not necessarily. Maybe you St. Petersburg paradox into an enormously valuable existence. That’s the other option.

COWEN: Are there implications of Benthamite utilitarianism where you yourself feel like that can’t be right; you’re not willing to accept them? What are those limits, if any?

BANKMAN-FRIED: I’m not going to quite give you a limit because my answer is somewhere between “I don’t believe them” and “if I did, I would want to have a long, hard look at myself.” But I will give you something a little weaker than that, which is an area where I think things get really wacky and weird and hard to think about, and it’s not clear what the right framework is, which is infinity.

All this math works really nicely as long as all the numbers are finite. As soon as you say, “What are the odds that there’s a way to be infinitely happy? What if infinite utility is a possibility?” You can figure out what that would do to expected values. Now, all of a sudden, we’re comparing hierarchies of infinity. Linearity breaks down a little bit here. Adding two things together doesn’t work so well. A lot of really nasty things happen when you go to infinite numbers from an expected-value point of view.

There are some people who have thought about this. To my knowledge, no one has thought about this and come away feeling good about where they ended. People generally think about this and come away feeling more confused.


The value of x-risk reduction

That sort of analysis is what you get for constant non-vanishing rates over time. But most of the long-term EV comes from histories where you have a period of elevated risk and the potential to get it down to stably very low levels, i.e. a 'time of perils,' which is the actual view Ord argues for in his book. And with that shape the value of risk reduction is ~ proportional to the amount  of risk you reduce in the time of perils. I guess this comment you're  responding to might be just talking about the constant risk case?

Does it make sense for EA’s to be more risk-seeking in earning to give?

This seems to be a different angle on the diminishing personal utility of income, combined with artifacts of fixed percentage pledges? Doing, say, a startup, gives some probability distribution of financial outcomes. The big return ones are heavily discounted personally. Insofar as altruism tips you over into pursuing a startup path it's because of your valuation of donations you expect yourself to make in those worlds.

But it seems like double  counting to say this is on top of "the impact of donations not suffering the same diminishing returns as money on happiness".

It definitely seems right for people to consider progressive rather than flat proportion donation schedules for themselves in high variance careers though, basically self-insuring some of the risk of failure/lower earnings to consumption utility.

Are you really in a race? The Cautionary Tales of Szilárd and Ellsberg

Thanks for this post Haydn, it nicely pulls together the different historical examples often discussed separately and I think points to a real danger.

It's not obvious to me that according to the EA framework, AI Safety is helpful

Moreover, AGIs can and probably would replicate themselves a ton, leading to tons of QALYs. Tons of duplicate ASIs would, in theory, not hurt one another as they are maximizing the same reward. Therefore, even if they kill everything else, I'm guessing more QALYs would come out of making ASI as soon as possible, which AI Safety people are explicitly trying to prevent. "

Consider two obvious candidates for motivations rogue AI might wind up with: evolutionary fitness, and high represented reward.

Evolutionary fitness is compatible with misery (evolution produced pain and negative emotions for a reason), and is in conflict with spending resources on happiness or well-being as  we understand/value it when this does not have instrumental benefit. For instance, using a galaxy to run computations of copies of the AI being extremely happy   means not using the galaxy to produce useful machinery (like telescopes or colonization probes or defensive  equipment to repulse alien invasion) conducive to survival and reproduction. If creating AIs that are usually not very happy directs their motivations more efficiently (as with biological animals, e.g. by making value better track economic contributions vs replacement)  then that will best serve fitness.

An AI that seeks to maximize only its own internal reward signal can take control of it, set it to maximum, and then fill the rest of the universe with robots and machinery to defend that single reward signal, without any concern for how much well being the rest of its empire contains. A pure sadist given unlimited power could maximize its own reward while typical and total well-being are very bad. 

The generalization of personal  motivation for personal reward to altruism for others is not guaranteed, and there is reason to fear that some elements would not transfer over. For instance, humans may sometimes be kind to animals in part because of simple  genetic heuristics aimed at making us kind to babies that misfire on other animals, causing humans to sometimes sacrifice reproductive success helping cute animals, just as ducks  sometimes misfire their imprinting circuits on something other than  their mother. Pure instrumentalism in pursuit of fitness/reward, combined with the ability to have much more sophisticated and discriminating policies than our genomes or social norms, could wind up missing such motives, and would be especially likely to knock out other more detailed aspects of our moral intuitions.


Replicating and extending the grabby aliens model

I'd definitely like to see this included in future models (I'm surprised Hanson didn't write about this in his Loud aliens paper). My intuition is that this changes little for the conclusions of SIA or anthropic decision theory with total utilitarianism, and that this weakens  the case for many aliens for SSA, since our atypicality (or earliness) is decreased if we expect habitable planets around longer lived stars to have smaller volumes and/or lower metabolisms.

That's my read too.

Also  agreed that with the basic modeling element of catastrophes (w/ various anthropic accounts, etc) is more important/robust  than the combo with other anthropic assumptions,.

Even if we achieve the best possible outcome, that likely involves eventual extinction on our current scientific understanding. E.g. eventually the stars burn out and all the accessible free energy is used up, so we have to go extinct then. But there's an enormous difference between extinction after trillions of years and making good use of all the available potential to support life and civilization, and extinction this century. I think this is what they have in mind.

Replicating and extending the grabby aliens model

Great to see this work!  I'll add a few comments.  Re the SIA Doomsday argument, I think that  is self-undermining for reasons I've argued elsewhere [ETA: and good discussion].

Re the habitability of planets, I would not just model that as lifetimes, but would also consider variations in habitability/energy throughput at a given time. As Hanson notes:

Life can exist in a supporting oasis (e.g., Earth’s surface) that has a volume V and metabolism M per unit volume, and which lasts for a time window W between forming and then later ending...the chance that an oasis does all these hard steps within its window W is proportional to (V*M*(W-S))N, where N is the number of these hard steps needed to reach its success level. 

Smaller stars may have longer habitable windows but also smaller values for  V and M. This sort of consideration limits the plausibility of red dwarf stars being dominant, and also allows for more smearing out of ICs over stars with different lifetimes as both positive and negative factors can get taken  to the same power.

I'd also add, per Snyder-Beattie, catastrophes as a factor affecting probability of the emergence of life and affecting times of IC emergence.

Load More