David Krueger

94 karmaJoined May 2022

Comments
9

Forethought: A new AI macrostrategy group

quick thougths RE your reasons for working on it or not:

1) It seems like many people are not seeing them coming (e.g. AI safety community seems surprisingly unreceptive and to have made many predictable mistakes by ignoring structural causes of risk, e.g. being overly optimistic about companies prioritizing safety over competitiveness)
1) It seems like seeing them coming is predictably insufficient to stopping them happening, because they are the result of social dilemmas.
1) The structure of the argument appears to be the (fallacious): "if it is a real problem, other people will address it, so we don't need to" (cf https://www.explainxkcd.com/wiki/index.php/2278:_Scientific_Briefing)

2) Interesting. Seems potentially cruxy.

3) I guess we might agree here... combined with (1), I guess your argument is: "won't be neglected (1) and is not tractable (3)", whereas I might say: "currently neglected, could require a lot of work to become tractable, seems important enough to warrant that effort"

The main upshots I see are:
- higher P(doom) due to stories that are easier for many people to swallow --> greater ability and potential for public awareness and political will if messaging includes this.
- more attention needed to questions of social organization post-AGI.

RSPs are pauses done right

David Krueger2y11

"With the use of fine-tuning, and a bunch of careful engineering work, capabilities evaluations can be done reliably and robustly."

I strongly disagree with this (and the title of the piece). I've been having these arguments a lot recently, and I think these sorts of claims are emblamatic of a dangerously narrow view on the problem of AI x-safety, which I am disappointed to see seems quite popular.

A few reasons why this statement is misleading:
* New capabilities ellicitation techniques arrive frequently and unpredictably (think chain of thought, e.g.)
* The capabilities of a system could be much greater than any particular LLM involved in that system (think tool use and coding). On the current trajectory, LLMs will increasingly be heavily integrated into complex socio-technical systems. The outcomes are unpredictable, but it's likely such systems will exhibit capabilities significantly beyond what can be predicted from evaluations.

You can try to account for the fact that you're competing against the entire world's ingenuity by your privileged access (e.g. for fine-tuning or white-box capabilities ellicitation methods), but this is unlikely to provide sufficient coverage.

EtA: Understanding whether and to what extent the original claim is true is something that would likely require years of research at a minimum.

Clarifications about structural risk from AI

David Krueger2y9

I recently learned that in law, there is a breakdown as:

Intent (~=misuse)
Oblique Intent (i.e. a known side effect)
Recklessness (known chance of side effect)
Negligence (should've known chance of side effect)
Accident (couldn't have been expected to know)

This seems like a good categorization.

Assessing China's importance as an AI superpower

David Krueger2y14

A cutting-edge algorithmic or architectural discovery coming out of China would be particularly interesting in this respect.

Kaiming He was at MSR in China when he invented ResNets in 2015. Residual connections are part of transformers, and probably the 2nd most important architectural breakthrough in modern Deep Learning.

Native English speaker EAs: could you please speak slower?

David Krueger2y1

This very short book makes similar points and suggestions. I found it to be a good read, and would recommend it: https://www.amazon.co.uk/THAT-CLEAR-Effective-communication-multilingual/dp/1916280005

"IS THAT CLEAR?: Effective communication in a multilingual world"

Clarifications about structural risk from AI

David Krueger2y1

Thanks for writing this. I continue to be deeply frustrated by the "accident vs. misuse" framing.

In fact, one I am writing this comment because I think this post itself endorses that framing to too great an extent. For instance, I do not think it is appropriate to describe this simply as an accident:

engineers disabled an emergency brake that they worried would cause the car to behave overly cautiously and look worse than competitor vehicles.

I have a hard time imagining that they didn't realize this would likely make the cars less safe; I would say they made a decision to prioritize 'looking good' over safety, perhaps rationalizing it by saying it wouldn't make much difference and/or that they didn't have a choice because their livelihoods were at risk (which perhaps they were).

Now that I've got the whinging out of the way, I say thank you again for writing it, and that I found the distinction between "AI risks with structural causes" and "‘Non-AI’ risks partly caused by AI" quite valuable, and I hope it will be widely adopted.

The EA community does not own its donors' money

David Krueger2y31

I think this idea is worth an orders-of-magnitude deeper investigation than what you've described. Such investigations seem worth funding.

It's also worth noting that OP's quotation is somewhat selective, here I include the sub-bullets:

Within 5 years: EA funding decisions are made collectively
First set up experiments for a safe cause area with small funding pots that are distributed according to different collective decision-making mechanisms
Subject matter experts are always used and weighed appropriately in this decision mechanism
Experiment in parallel with: randomly selected samples of EAs are to evaluate the decisions of one existing funding committee - existing decision-mechanisms are thus ‘passed through’ an accountability layer
All decision mechanisms have a deliberation phase (arguments are collected and weighed publicly) and a voting phase (majority voting, quadratic voting..)
Depending on the cause area and the type of choice, either fewer (experts + randomised sample of EAs) or more people (any EA or beyond) will take part in the funding decision. """

Doing EA Better

David Krueger2y38

I strongly disagree with this response, and find it bizarre.

I think assessing this post according to a limited number of possible theories of change is incorrect, as influence is often diffuse and hard to predict or measure.

I agree with freedomandutility's description of this as an "isolated demand for [something like] rigor".

A note about differential technological development

David Krueger3y1

I'm curious to dig into this a bit more, and hear why you think these seem like fairy tales to you (I'm not saying that I disagree...).

I wonder if this comes down to different ideas of what "solve alignment" means (I see you put it in quotes...)

1) Are you perhaps thinking that realistic "solutions to alignment" will carry a significant alignment tax? Else why wouldn't ~everyone adopt alignment techniques (that align AI systems with their preferences/values)?

2) Another source of ambiguity: there are a lot of different things people mean by "alignment", including:
* AI is aligned with objectively correct values
* AI is aligned with a stakeholder and consistently pursues their interests
* AI does a particular task as intended/expected
Is one of these in particular (or something else) that you have in mind here?

David Krueger

Comments9

Comments
9