All of Greg_Colbourn's Comments + Replies

I see in your comment on that post, you say "human extinction would not necessarily be an existential catastrophe" and "So, if advanced AI, as the most powerful entity on Earth, were to cause human extinction, I guess existential risk would be negligible on priors?". To be clear: what I'm interested in here is human extinction (not any broader conception of "existential catastrophe"), and the bet is about that.

2
Vasco Grilo
9d
Agreed.

See my comment on that post for why I don't agree. I agree nuclear extinction risk is low (but probably not that low)[1]. ASI is really the only thing that is likely to kill every last human (and I think it is quite likely to do that given it will be way more powerful than anything else[2]).

  1. ^

    But too be clear, global catastrophic / civilisational collapse risk from nuclear is relatively high (these often get conflated with "extinction").

  2. ^

    Not only do I think it will kill every last human, I think it's quite likely it will wipe out all known carbon-based life

... (read more)

Interesting. Obviously I don't want to discourage you from the bet, but I'm surprised you are so confident based on this! I don't think the prior of mammal species duration is really relevant at all, when for 99.99% of the last 1M years there hasn't been any significant technology. Perhaps more relevant is homo sapiens wiping out all the less intelligent hominids (and many other species).

2
Vasco Grilo
9d
On the question of priors, I liked AGI Catastrophe and Takeover: Some Reference Class-Based Priors. It is unclear to me whether extinction risk has increased in the last 100 years. I estimated an annual nuclear extinction risk of 5.93*10^-12, which is way lower than the prior for wild mammals of 10^-6.

I think the chance of humans going extinct until the end of 2027 is basically negligible. I would guess around 10^-7 per year.

Would be interested to see your reasoning for this, if you have it laid out somewhere. Is it mainly because you think it's ~impossible for AGI/ASI to happen in that time? Or because it's ~impossible for AGI/ASI to cause human extinction?

4
Vasco Grilo
9d
I have not engaged so much with AI risk, but my views about it are informed by considerations in the 2 comments in this thread. Mammal species usually last 1 M years, and I am not convinced by arguments for extinction risk being much higher (I would like to see a detailed quantitative model), so I start from a prior of 10^-6 extinction risk per year. Then I guess the risk is around 10 % as high as that because humans currently have tight control of AI development. To be consistent with 10^-7 extinction risk, I would guess 0.1 % chance of gross world product growing at least 30 % in 1 year until 2027, due to bottlenecks whose effects are not well modelled in Tom Davidson's model, and 0.01 % chance of human extinction conditional on that.

I don't have a stable income so I can't get bank loans (I have tried to get a mortgage for the property before and failed - they don't care if you have millions in assets, all they care about is your income[1], and I just have a relatively small, irregular rental income (Airbnb). But I can get crypto-backed smart contract loans, and do have one out already on Aave, which I could extend.). 

Also, the signalling value of the wager is pretty important too imo. I want people to put their money where their mouth is if they are so sure that AI x-risk isn't a... (read more)

It's in Manchester, UK. I live elsewhere - renting currently, but shortly moving into another owned house that is currently being renovated (I've got a company managing the would-be-collateral house as an Airbnb, so no long term tenants either). Will send you more details via DM.

Cash is a tricky one, because I rarely hold much of it. I'm nearly always fully invested. But that includes plenty of liquid assets like crypto. Net worth wise, in 2027, assuming no AI-related craziness, I would be expect it to be in the 7-8 figure range, 5-95% maybe $500k-$100M).

2
Vasco Grilo
10d
Thanks! Could you also clarify where is your house, whether you live there or elsewhere, and how much cash you expect to have by the end of 2027 (feel free to share the 5th percentile, median and 95th percentile)?

Re risk, as per my offer on X, I'm happy to put my house up as collateral if you can be bothered to get the paperwork done. Otherwise happy to just trade on reputation (you can trash mine publicly if I don't pay up).

As I say above, I've been offering a similar bet for a while already. The symbolism is a big part of it. 

I can currently only take out crypto-backed loans, which have been quite high interest lately (don't have a stable income so can't get bank loans or mortgages), and have considered this but not done it yet.

Hi Vasco, sorry for the delay getting back to you. I have actually had a similar bet offer up on X for nearly a year (offering to go up to $250k) with only one taker for ~$30 so far! My one is you give x now and I give 2x in 5 years, which is pretty similar. Anyway, happy to go ahead with what you've suggested. 

I would donate the $10k to PauseAI (I would say $10k to PauseAI in 2024 is much greater EV than $19k to PauseAI at end of 2027).

[BTW, I have tried to get Bryan Caplan interested too, to no avail - if anyone is in contact with him, please ask him about it.]

4
Jason
11d
As much as I may appreciate a good wager, I would feel remiss not to ask if you could get a better result for amount of home equity at risk by getting a HELOC and having a bank be the counterparty? Maybe not at lower dollar amounts due to fixed costs/fees, but likely so nearer the $250K point -- especially with the expectation that interest rates will go down later in the year.
4
Vasco Grilo
11d
Thanks for following up, Greg! Strongly upvoted. I will try to understand how I can set up a contract describing the bet with your house as collateral. Could you link to the post on X you mentioned? I will send you a private message with Bryan's email.

I'd say it's more than a vague intuition. It follows from alignment/control/misuse/coordination not being (close to) solved and ASI being much more powerful than humanity. I think it should be possible to formalise it, even. "AGIs will be helping us on a lot of tasks", "collusion is hard" and "people will get more scared over time" aren't anywhere close to overcoming it imo.

2
richard_ngo
11d
These are what I mean by the vague intuitions. Nobody has come anywhere near doing this satisfactorily. The most obvious explanation is that they can't.

More like, some people did share their concerns, but those they shared them with didn't do anything about it (because of worrying about bad PR, but also maybe just as a kind of "ends justify the means" thing re his money going to EA. The latter might actually have been the larger effect.).

1
James Herbert
1mo
Ah ok - I guess I would phrase it as 'not doing anything about concerns because they were too focused on short-term PR'.  I would phrase it this way because, in a world where EA had been more focused on PR, I  think we would have been less likely to end up with a situation like SBF (because us having more of a focus on PR would have resulted in doing a better job of PR).

Maybe half the community sees it that way. But not the half with all the money and power it seems. There aren't (yet) large resources being put into playing the "outside game". And there hasn't been anything in the way of EA leadership (OpenPhil, 80k) admitting the error afaik.

What makes you think the consciousness is expressed in human language by LLMs? Could it not be that the human language output is more akin to our unconscious physiological processes, and the real consciousness is in inscrutable (to us) floating point numbers (if it is there at all)?

What does Claude 3 produce from a null prompt (inc no pre-prompt)? Is it just gibberish? Does it show signs of consciousness? Has anyone done this experiment?

See all my comments and replies on the anti-pause posts. I don't think any of the anti-pause arguments stand up if you put significant weight on timelines being short and p(doom) high (and viscerally grasp that yes, that means your own life is in danger, and those of your friends and family too, in the short term! It's no longer just an abstract concern!).

As part of an AMA I put on X, I was asked for my "top five EA hot takes". If you'll excuse the more X-suited tone and spiciness, here they are:

1. OpenAI, Anthropic (and to a lesser extent DeepMind) were the worst cases of Unilateralists Curse of all time. EAs love to discourage enthusiastic newcomers by warning to not do "net negative" unilateralist actions (i.e. don't start new projects in case they crowd out better, more "well thought through" projects in future, with "more competent" people doing them), but nothing will ever top the monumental unilatera... (read more)

3
James Herbert
2mo
Could you explain why you think ‘too much focus being placed on PR’ resulted in bad press? Perhaps something like: because people were worried about harming SBF’s public reputation they didn’t share their concerns with others, and thus the community as a whole wasn’t able to accurately model his character and act appropriately?
4
Jason
2mo
Do you think there is tension between 2 and 4 insofar as mechanisms to get a pause done may rely strongly on public support?

These all seem good topics to flesh out further! Is 1 still a "hot take" though? I thought this was pretty much the consensus here at this point? 

Regarding 2 - Hammers love Nails. EAs as Hammers, love research, so they bias towards seeing the need for more research (after all, it is what smart people do). Conversely, EAs are less likely (personality-wise) to be comfortable with advocacy and protests (smart people don't do this). It is the wrong type of nail.

[Separating out this paragraph into a new comment as I'm guessing it's what lead to the downvotes, and I'd quite like the point of the parent paragraph to stand alone. Not sure if anyone will see this now though.]

I think it's imperative to get the leaders of AGI companies to realise that they are in a suicide race (and that AGI will likely kill them too). The default outcome of AGI is doom. For extinction risk at the 1% level, it seems reasonable (even though it's still 80M lives in expectation) to pull the trigger on AGI for a 99% chance of utopia. This i... (read more)

Also, in general I'm personally much more sceptical of such a moonshot paying off, given shorter timelines and the possibility that x-safety from ASI may well be impossible. I think OP was 2022's best idea for AI Safety. 2024's is PauseAI.

People from those orgs were aware, but none were keen enough about the idea to go as far as attempting a pilot run (e.g. the 2 week retreat idea). I think general downside risk aversion was probably a factor. This was in the pre-chatGPT days of a much narrower Overton Window though, so maybe it's time for the idea to be revived? On the other hand, maybe it's much less needed now there is government involvement, and national AI Safety Institutes attracting top talent.

3
Greg_Colbourn
2mo
Also, in general I'm personally much more sceptical of such a moonshot paying off, given shorter timelines and the possibility that x-safety from ASI may well be impossible. I think OP was 2022's best idea for AI Safety. 2024's is PauseAI.

At vastly superhuman capabilities (including intelligence and rationality), it should be easier to reduce existential-level mistakes to tiny levels. They would have vastly more capability for assessing and mitigating risks and for moral reflection

They are still human though, and humans are famous for making mistakes, even the most intelligent and rational of us. It's even regarded by many as part of what being human is - being fallible. That's not (too much of) a problem at current power differentials, but it is when we're talking of solar-system-rearrangi... (read more)

Perhaps. But remember they will be smarter than us, so controlling them might not be so easy (especially if they gain access to enough computer power to speed themselves up massively. And they need not be hostile, just curious, to accidentally doom us.)

Because of the crazy high power differential, and propensity for accidents (can a human really not mess up on an existential scale if acting for millions of years subjectively at superhuman capability levels?). As I say in my comment above:

Even the nicest human could accidentally obliterate the rest of us if uplifted to superintelligence and left running for subjective millions of years (years of our time). "Whoops, I didn't expect that to happen from my little physics experiment"; "Uploading everyone into a hive mind is what my extrapolations suggested wa

... (read more)
4
MichaelStJules
4mo
This doesn’t seem like a strong enough argument to justify a high probability of existential catastrophe (if that's what you intended?). At vastly superhuman capabilities (including intelligence and rationality), it should be easier to reduce existential-level mistakes to tiny levels. They would have vastly more capability for assessing and mitigating risks and for moral reflection (not that this would converge to some moral truth; I don’t think there is any). If you think this has a low chance of success (if we could delay AGI long enough to actually do it), then alignment seems pretty hopeless to me on that view, and a temporary pause only delays the inevitable doom. I do think we could do better (for upside-focused views) by ensuring more value pluralism and preventing particular values from dominating, e.g. by uploading and augmenting multiple minds.

I agree that they would most likely be safer than ML-derived ASI. What I'm saying is that they still won't be safe enough to prevent an existential catastrophe. It might buy us a bit more time (if uploads happen before ASI), but that might only be measured in years. Moratorium >> mind uploads > ML-derived ASI.

4
MichaelStJules
4mo
Why do you expect an existential catastrophe from augmented mind uploads?

I think there is an unstated assumption here that uploading is safe. And by safe, I mean existentially safe for humanity[1]. If in addition to being uploaded, a human is uplifted to superintelligence, would they -- indeed any given human in such a state -- be aligned enough with humanity as a whole to not cause an existential disaster? Arguably humans right now are only relatively existentially safe because power imbalances between them are limited.

Even the nicest human could accidentally obliterate the rest of us if uplifted to superintelligence and left ... (read more)

3
MichaelStJules
4mo
We could upload many minds, trying to represent some (sub)distribution of human values (EDIT: and psychological traits), and augment them all slowly, limiting power imbalances between them along the way.
2
Will Aldred
4mo
Yes, this is a fair point; Holden has discussed these dangers a little in “Digital People Would Be An Even Bigger Deal”. My bottom-line belief, though, is that mind uploads are still significantly more likely to be safe than ML-derived ASI, since uploaded minds would presumably work, and act, much more similarly to (biological) human minds. My impression is that others also hold this view? I’d be interested if you disagree. To be clear, I rank moratorium > mind uploads > ML-derived ASI, but I think it’s plausible that our strategy portfolio should include mind uploading R&D alongside pushing for a moratorium.

If you're on X, Please share my tweet re the book giveaway.

Good point re it being a quantitative matter. I think the current priority is to kick the can down the road a few years with a treaty. Once that's done we can see about kicking the can further. Without a full solution to x-safety|AGI (dealing with alignment, misuse and coordination), maybe all we can do is keep kicking the can down the road.

 "woah, AI is powerful, I better be the one to build it"

I think this ship has long since sailed. The (Microsoft) OpenAI, Google Deepmind and (Amazon) Anthropic race is already enough to end the world. They have enough money, and all the best talent. If anything, if governments enter the race that might actually slow things down, by further dividing talent and the hardware supply. 

We need an international AGI non-proliferation treaty. I think any risks for governments joining the race is more than outweighed by the chances of them working toward a viable treaty.

3
Tamsin Leake
4mo
I don't think "has the ship sailed or not" is a binary (see also this LW comment). We're not actually at maximum attention-to-AI, and it is still worthy of consideration whether to keep pushing things in the direction of more attention-to-AI rather than less. And this is really a quantitative matter, since a treaty can only buy some time (probably at most a few years).

It's not even (necessarily) a default instrumental goal. It's collateral damage as the result of other instrumental goals. It may just go straight for dismantling the Sun, knowing that we won't be able to stop it. Or straight for ripping apart the planet with nanobots (no need for a poison everyone simultaneously step).

1
dsj
4mo
Fair enough, I edited it again. I still think the larger points stand unchanged.

I do not agree that it is absolutely clear that the default goal of an AGI is for it to kill literally everyone, as the OP asserts.

The OP says 

goals that entail killing literally everyone (which is the default)

[my emphasis in bold]. This is a key distinction. No one is saying that the default goal will be killing humans; the whole issue is one of collateral damage - it will end up with (to us) arbitrary goals that result in convergent intstrumental goals that lead to us all dead as collateral damage (e.g. turning the planet into "computronium", or dismantling the Sun for energy).

1
dsj
4mo
Sure, I understand that it’s a supposed default instrumental goal and not a terminal goal. Sorry that my wording didn’t make that distinction clear. I’ve now edited it to do so, but I think my overall points stand.

No one is saying p(doom) is 100%, but there is good reason to think that it is 50% or more - that the default outcome of AGI is doom. It doesn't default to somehow everything being ok. To alignment solving itself, or the alignment that has been done today (or by 2030) being enough if we get a foom tomorrow (by 2030). I've not seen any compelling argument to that effect.

Thanks for the links. I think a lot of the problem with the proposed solutions is that they don't scale to ASI, and aren't water tight. Having 99.999999% alignment in the limit of ASI perfor... (read more)

I've not come across any arguments that debunk the risk in anywhere near the same rigour (and I still have a $1000 bounty open here). Please link to the "careful thought on the matter" from the other side that you mention (or add here). I'm with Richard Ngo when he says:

I'm often cautious when publicly arguing that AGI poses an existential risk, because our arguments aren't as detailed as I'd like. But I should remember that the counterarguments are *much* worse - I've never seen a plausible rebuttal to the core claims. That's terrifying.

5
dsj
4mo
You seem to be lumping people like Richard Ngo, who is fairly epistemically humble, in with people who are absolutely sure that the default path leads to us all dying. It is only the latter that I'm criticizing. I agree that AI poses an existential risk, in the sense that it is hard to rule out that the default path poses a serious chance of the end of civilization. That's why I work on this problem full-time. I do not agree that it is absolutely clear that default instrumental goals of an AGI entail it killing literally everyone, as the OP asserts. (I provide some links to views dissenting from this extreme confidence here.)

There has already been much written on this, enough for there to be a decent level of consensus (which indeed there is here (EAF/LW)).

1
dsj
4mo
These essays are well known and I'm aware of basically all of them. I deny that there's a consensus on the topic, that the essays you link are representative of the range of careful thought on the matter, or that the arguments in these essays are anywhere near rigorous enough to meet my criterion: justifying the degree of confidence expressed in the OP (and some of the posts you link).

Here's my attempt at a concise explanation for why the default outcome of AGI is doom.

It's pretty much just a point of throwing more money (compute and data) at it now. Current systems are only not killing everyone because they are weak.

Personally I think AI x-risk (and in particular, slowing down AI) is the current top cause area, but I'm also keen on most other EA cause areas, inc Global Health (hence the focus on general EA from the start); but the update is mainly a reflection of what's been happening on the ground in terms of our applicants, and our (potential) funding sources.

paid the estate $26,786,503, an amount equal to 100% of the funds the entities received from FTX and the FTX Foundation

Interested to know whether this was a result of EV being pro-active, or being pressured by the FTX bankruptcy estate, given the relationship(s) between EV (trustees) and SBF. And what the implications might be for other orgs who received FTX funding. Have/are any other EA orgs paying money back?

3
Jason
4mo
I haven't seen anything to update my initial reaction here. I'm cautious about applying to any other organization because I can't be giving anyone legal advice. I'd add that almost all other potential clawbacks are metaphorically classified as relatively small potatoes (using the threshold amount at which the proposed settlement must be publicly filed on the open docket as the dividing line for small vs. midsize potatoes). Many are at least an order of magnitude under the small-potatoes threshold. I expect the estate's willingness to litigate those cases with its $2000-per-hour lawyers, or even junior associates billing more like $750-$1000 per hour, will be significantly lower than it would have been with EV, and that might be reflected in the deals that were offered. I have no inside info, though.
8
Larks
4mo
I think a key detail here is they gave back all the 2022 money (90% of total) and kept all pre-2022 money (10% of total). https://restructuring.ra.kroll.com/FTX/Home-DownloadPDF?id1=MjU5MTQzNg==&id2=-1

[Meta] Forum bug: when there were no comments it was showing as -1 comments

IIRC Carl had a $5M discretionary funding pot from OpenPhil. What has he funded with it?

9
CarlShulman
5mo
Not much new on that front besides continuing to back the donor lottery in recent years, for the same sorts of reasons as in the link, and focusing on research and advising rather than sourcing grants.

So one of the main reasons for the donation matching is for social proof - I don’t want to be the only person who thinks that CEEALAR is worth funding! If the matching funds aren’t maxed out, I will probably (90%) still fund CEEALAR enough to get us to May to have another go at getting an SFF grant, but I would be more reluctant to (65%) without the evidence of enough other people thinking it’s worth significantly funding too. I get that this is somewhat subjective, so sorry if it's a bit of a cop out.

It seems unlikely that we'll ever get AI x-risk down to negligible levels, but it's currently striking how high a risk is being tolerated by those building (and regulating) the technology, when compared to, as you say, aviation, and also nuclear power (<1 catastrophic accident in 100,000 years being what's usually aimed for). I think at the very least we need to reach a global consensus on what level of risk we are willing to tolerate before continuing with building AGI.

I guess you're sort of joking, but it should be really surprising (from an outside perspective) that biological brains have figured out how to understand neural networks (and it's taken billions of years of evolution).

Thoughts on this? Supposedly shows the leaked letter to the board. But seems pretty far out, and if true, it's basically game over (AES-192 encryption broken by the AI with new unintelligible maths; the AI proposing a new more efficient and flexible architecture for itself). Really hope the letter is just a troll!

Altman starting a new company could still slow things down a few months. Which could be critically important if AGI is imminent. In those few months perhaps government regulation with teeth could actually come in, and then shut the new company down before it ends the world.

Looks like Matthew did post a model of doom that contains something like this (back in May, before the top level comment:

My modal tale of AI doom looks something like the following: 

1. AI systems get progressively and incrementally more capable across almost every meaningful axis. 

2. Humans will start to employ AI to automate labor. The fraction of GDP produced by advanced robots & AI will go from 10% to ~100% after 1-10 years. Economic growth, technological change, and scientific progress accelerates by at least an order of magnitude, and pr

... (read more)

The link is dead. Is it available anywhere else?

1
Ulrik Horn
6mo
Still works for me. Not sure why it's not working for everyone.

Agree, but I also think that insufficient "security mindset" is still a big problem. From OP:

it still remains to be seen whether US and international regulatory policy will adequately address every essential sub-problem of AI risk. It is still plausible that the world will take aggressive actions to address AI safety, but that these actions will have little effect on the probability of human extinction, simply because they will be poorly designed. One possible reason for this type of pessimism is that the alignment problem might just be so difficult to sol

... (read more)

I imagine it going hand in hand with more formal backlashes (i.e. regulation, law, treaties).

Overall I don’t have settled views on whether it’d be good for me to prioritize advocating for any particular policy.5 At the same time, if it turns out that there is (or will be) a lot more agreement with my current views than there currently seems to be, I wouldn’t want to be even a small obstacle to big things happening, and there’s a risk that my lack of active advocacy could be confused with opposition to outcomes I actually support.

You have a huge amount of clout in determining where $100Ms of OpenPhil money is directed toward AI x-safety. I think yo... (read more)

  • There’s a serious (>10%) risk that we’ll see transformative AI2 within a few years.
  • In that case it’s not realistic to have sufficient protective measures for the risks in time.
  • Sufficient protective measures would require huge advances on a number of fronts, including information security that could take years to build up and alignment science breakthroughs that we can’t put a timeline on given the nascent state of the field, so even decades might or might not be enough time to prepare, even given a lot of effort.

If it were all up to me, the world would

... (read more)
Load more