Greg_Colbourn ⏸️

5959 karmaJoined
Interests:
Slowing down AI

Bio

Participation
4

Global moratorium on AGI, now (Twitter). Founder of CEEALAR (née the EA Hotel; ceealar.org)

Comments
1190

Just thinking: surely to be fair, we should be aggregating all the AI results into an "AI panel"? I wonder how much overlap there is between wrong answers amongst the AIs, and what the aggregate score would be? 

Right now, as things stand with the scoring, "AGI" in ARC-AGI-2 means "equivalent to the combined performance of a team of 400 humans", not "(average) human level".

Ok, I take your point. But no one seems to be actually doing this (seems like it would be possible to do already, for this example; yet it hasn't been done.)

What do you think a good resolution criteria for judging a system as being AGI should be?

Most relevant to X-risk concerns would be the ability to do A(G)I R&D as good as top AGI company workers. But then of course we run into the problem of crossing the point of no return in order to resolve the prediction market. And we obviously shouldn't do that (unless superalignment/control is somehow solved).

The human testers were random people off the street who got paid $115-150 to show up and then an additional $5 per task they solved. I believe the ARC Prize Foundation’s explanation for the 40-point discrepancy is that many of the testers just didn’t feel that motivated to solve the tasks and gave up [my emphasis]. (I vaguely remember this being mentioned in a talk or interview somewhere.)

I'm sceptical of this when they were able to earn $5 for every couple of minutes' work (time to solve a task). This is far above the average hourly wage.

100% is the score for a "human panel", i.e. a set of at least two humans.

Also seems very remarkable (suspect, in fact) - this would mean almost no overlap between the questions that the humans were getting wrong - i.e. if each human averages 60% right, then for 2 humans to get 100% there can only be 20% of questions where both get it right! I think in practice the panels that score 100% have to contain many more than 2 humans on average.

EDIT: looks like "at least 2 humans" means at least 2 humans solved every problem in the set, out of the 400 humans that attempted them!

See the quote in the footnote: "a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems."

the forecasts do not concern a kind of system that would be able to do recursive self-improvements (none of the indicators have anything to do with it)

The indicators are all about being human level at ~everything kind of work a human can do. That includes AI R&D. And AIs are already known to think (and act) much faster than humans, and that will only become more pronounced as the AGI improves itself; hence the "rapid recursive self-improvement".

Even if it takes a couple of years, we would probably cross a point of no return not long after AGI.

None of these indicators actually imply that the "AGI" meeting them would be dangerous or catastrophic to humanity

Thanks of pointing this out. There was indeed a reasoning step missing from the text. Namely: such AGI would be able to automate further AI development, leading to rapid recursive self-improvement to ASI (Artificial Superintelligence). And it is ASI that will be lethally intelligent to humanity (/all biological life). I've amended the text.

there is nothing to indicate that such a system would be good at any other task

The whole point of having the 4 disparate indicators is that they have to be done by a single unified system (not specifically trained for only those tasks)[1]. Such a system would implicitly be general enough to do many other tasks. Ditto with the Strong AGI question.

While an ideal adversarial Turing test would be a very difficult task for an AI system, ensuring these ideal conditions is often not feasible. Therefore, I'm certainly going to expect news that AI systems will pass some form of the adversarial test

That is what both the Turing Test questions are all about! (Look at the success conditions in the fine print.)

  1. ^

    Metaculus: "By "unified" we mean that the system is integrated enough that it can, for example, explain its reasoning on an SAT problem or Winograd schema question, or verbally report its progress and identify objects during videogame play. (This is not really meant to be an additional capability of "introspection" so much as a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems.)"

One option, if you want to do a lot more about it than you currently are, is Pause House. Another is donating to PauseAI (US, Global). In my experience, being pro-active about the threat does help.

I have to think holding such a belief is incredibly distressing.

Have you considered that you might be engaging in motivated reasoning because you don't want to be distressed about this? Also, you get used to it. Humans are very adaptable.

The 10% comes from the linked aggregate of forecasts, from thousands of people's estimates/bets on Metaculus, Manifold and Kalshi; not the EA community.

I think this is pretty telling. I've also had a family member say a similar thing. If your reasoning is (at least partly) motivated by wanting to stay sane, you probably aren't engaging with the arguments impartially. 

I would bet a decent amount of money that you would not in fact, go crazy. Look to history to see how few people went crazy over the threat of nuclear annihilation in the Cold War (and all the other things C.S. Lewis refers to in the linked quote).

Load more