Founder of CEEALAR (née the EA Hotel;


A case against strong longtermism

"while it might be pretty hard to predict whether AI risk is going to be a big deal by whatever measure, I can still be fairly certain that the sun will exist in a 1000 years"

These two things are correlated.

A case against strong longtermism

This [The ergodicity problem in economics] seems like it could be important, and might fit in somewhere with the discussions of expected utility. I haven't really got my head around it though.

Starting with $100, your bankroll increases 50% every time you flip heads. But if the coin lands on tails, you lose 40% of your total. Since you’re just as likely to flip heads as tails, it would appear that you should, on average, come out ahead if you played enough times because your potential payoff each time is greater than your potential loss. In economics jargon, the expected utility is positive, so one might assume that taking the bet is a no-brainer.

Yet in real life, people routinely decline the bet. Paradoxes like these are often used to highlight irrationality or human bias in decision making. But to Peters, it’s simply because people understand it’s a bad deal.

Here’s why. Suppose in the same game, heads came up half the time. Instead of getting fatter, your $100 bankroll would actually be down to $59 after 10 coin flips. It doesn’t matter whether you land on heads the first five times, the last five times or any other combination in between.

Leopold Aschenbrenner returns to X-risk and growth

The idea is that by speeding through you increase risk initially, but the total risk is lower - i.e. smaller area under the grey curve here:

I think this probably breaks down if the peak is high enough though (here I'm thinking of AGI x-risk). Aschenbrenner gives the example of:

On the other extreme, humanity is extremely fragile. No matter how high a fraction of our resources we dedicate to safety, we cannot prevent an unrecoverable catastrophe. ..there is nothing we can do regardless. An existential catastrophe is inevitable, and it is impossible for us to survive to reach a grand future. 

And argues that

even if there is some probability we do live in this world, to maximize the moral value of the future, we should act as if we live in the other scenarios where a long and flourishing future is possible.

I'm not sure if this applies if there is some possibility of "pulling the curve sideways" to flatten it - i.e. increase the fraction of resources spent on safety whilst keeping consumption (or growth) constant. This seems to be what those concerned with x-risk are doing for the most part (rather than trying to slow down growth).

AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher

Here is an argument for how GPT-X might lead to proto-AGI in a more concrete, human-aided, way: 

..language modelling has one crucial difference from Chess or Go or image classification. Natural language essentially encodes information about the world—the entire world, not just the world of the Goban, in a much more expressive way than any other modality ever could.[1] By harnessing the world model embedded in the language model, it may be possible to build a proto-AGI.


This is more a thought experiment than something that’s actually going to happen tomorrow; GPT-3 today just isn’t good enough at world modelling. Also, this method depends heavily on at least one major assumption—that bigger future models will have much better world modelling capabilities—and a bunch of other smaller implicit assumptions. However, this might be the closest thing we ever get to a chance to sound the fire alarm for AGI: there’s now a concrete path to proto-AGI that has a non-negligible chance of working.

Super-exponential growth implies that accelerating growth is unimportant in the long run

Nice post! Meta: footnote links are broken, and references to [1] and [2] aren't in the main body.

Also could [8] be referring to this post? It only touches on your point though:

Defensive consideration also suggest that they’d need to maintain substantial activity to watch for and be ready to respond to attacks.

EA Hotel Fundraiser 5: Out of runway!

We now have general funding for the next few months and are hiring for both a Community & Projects Manager and an Operations Manager, with input from Nicole and others at CEA. Unfortunately with the winding down of EA Grants the possibility of funding for the Community & Projects Manager salary has gone. If anyone would like to top up the salaries for either the Community & Projects Manager or Operations Manager (currently ~£21.5k/yr pro rata including free accommodation and food), please get in touch!

Donor Lottery Debrief

Looking for more projects like these

CEEALAR (formerly the EA Hotel) is looking for funding to cover operations from Jan 2021 onward.

AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher

Sorry if this isn’t as polished as I’d hoped. Still a lot to read and think about, but posting as I won’t have time now to elaborate further before the weekend. Thanks for doing the AMA!

It seems like a crux that you have identified is how “sudden emergence” happens. How would a recursive self-improvement feedback loop start? Increasing optimisation capacity is a convergent instrumental goal. But how exactly is that goal reached? To give the most pertinent example - what would the nuts and bolts of it be for it happening in an ML system? It’s possible to imagine a sufficiently large pile of linear algebra enabling recursive chain reactions of both improvement in algorithmic efficiency, and size (e.g. capturing all global compute -> nanotech -> converting Earth to Computronium). Even more so since GPT-3. But what would the trigger be for setting it off?

Does the above summary of my take of this chime with yours? Do you (or anyone else reading) know of any attempts at articulating such a “nuts-and-bolts” explanation of “sudden emergence” of AGI in an ML system?

Or maybe there would be no trigger? Maybe a great many arbitrary goals would lead to sufficiently large ML systems brute-force stumbling upon recursive self-improvement as an instrumental goal (or mesa-optimisation)?

Responding to some quotes from the 80,000 Hours podcast:

“It’s not really that’s surprising, I don’t have this wild destructive preference about how they’re arranged. Let’s say the atoms in this room. The general principle here is that if you want to try and predict what some future technology will look like, maybe there is some predictive power you get from thinking about X percent of the ways of doing this involve property P. But it’s important to think about where there’s a process by which this technology or artifact will emerge. Is that the sort of process that will be differentially attracted to things which are let’s say benign? If so, then maybe that outweighs the fact that most possible designs are not benign.”

What mechanism makes AI be attracted to benign things? Surely only through human direction? But to my mind the whole Bostrom/Yudkowsky argument is that it FOOMs out of control of humans (and e.g. converts everything into Computronium as a convergent instrumental goal.)

“There’s some intuition of just the gap between something that’s going around and let’s say murdering people and using their atoms for engineering projects and something that’s doing whatever it is you want it to be doing seems relatively large.”

This reads like a bit of a strawman. My intuition for the problem of instrumental convergence is that in many take-off scenarios the AI will perform (a lot) more compute, and the way it will do this is by converting all available matter to Computronium (with human-existential collateral damage). From what I’ve read, you don’t directly touch on such scenarios. Would be interested to hear your thoughts on them.

“my impression is that you typically won’t get behaviours which are radically different or that seem like the system’s going for something completely different.”

Whilst you might not typically get radically different behaviours, in the cases where ML systems do fail, they tend to fail catastrophically (in ways that a human never would)! This also fits in with the notion of hidden proxy goals from “mesa optimisers” being a major concern (as well as accurate and sufficient specification of human goals).

AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher

Have you had any responses from Bostrom or Yudkowsky to your critiques?

Why I'm Not Vegan

I'm thinking that for me it would be something like 1/100 of a year! Maybe 1/10 tops. And for those such as the OP who think that "there's just no one inside to suffer" - would you risk making such a swap (with a high multiple) if it was somehow magically offered to you?

Load More