I'm a PhD student at the Center for Human-Compatible AI (CHAI) at UC Berkeley. I edit and publish the Alignment Newsletter, a weekly publication with recent content relevant to AI alignment. In the past, I ran the EA UC Berkeley and EA at the University of Washington groups.


The academic contribution to AI safety seems large

Was going to write a longer comment but I basically agree with Buck's take here.

It's a little hard to evaluate the counterfactuals here, but I'd much rather have the contributions from EA safety than from non EA safety over the last ten years.

I wanted to endorse this in particular.

On the actual argument:

1. EA safety is small, even relative to a single academic subfield.
2. There is overlap between capabilities and short-term safety work.
3. There is overlap between short-term safety work and long-term safety work.
4. So AI safety is less neglected than the opening quotes imply.
5. Also, on present trends, there’s a good chance that academia will do more safety over time, eventually dwarfing the contribution of EA.

I agree with 1, 2, and 3 (though perhaps disagree with the magnitude of 2 and 3, e.g. you list a bunch of related areas and for most of them I'd be surprised if they mattered much for AGI alignment).

I agree 4 is literally true, but I'm not sure it necessarily matters, as this sort of thing can be said for ~any field (as Ben Todd notes). It would be weird to say that animal welfare is not neglected because of the huge field of academia studying animals, even though those fields are relevant to questions of e.g. sentience or farmed animal welfare.

I strongly agree with 5 (if we replace "academia" with "academia + industry", it's plausible to me academia never gets involved while industry does), and when I argue that "work will be done by non-EAs", I'm talking about future work, not current work.

Objections to Value-Alignment between Effective Altruists
It seems like an overstatement that the topics of EA are completely disjoint with topics of interest to various established academic disciplines.

I didn't mean to say this, there's certainly overlap. My claim is that (at least in AI safety, and I would guess in other EA areas as well) the reasons we do the research we do are different from those of most academics. It's certainly possible to repackage the research in a format more suited to academia -- but it must be repackaged, which leads to

rewrite your paper so that regular academics understand it whereas other EAs who actually care about it don't

I agree that the things you list have a lot of benefits, but they seem quite hard to me to do. I do still think publishing with peer review is worth it despite the difficulty.

Objections to Value-Alignment between Effective Altruists
Most of this was about very large documents on AI safety and strategy issues allegedly existing within OpenAI and MIRI.

I agree people trust MIRI's conclusions a bunch based on supposed good internal reasoning / the fact that they are smart, and I think this is bad. However, I think this is pretty limited to MIRI.

I haven't seen anything similar with OpenAI though of course it is possible.

I agree with all the other things you write.

Objections to Value-Alignment between Effective Altruists

It's a good post, I'm glad you wrote it :)

On the abstract level, I think I see EA as less grand / ambitious than you do (in practice, if not in theory) -- the biggest focus of the longtermist community is reducing x-risk, which is good by basically any ethical theory that people subscribe to (exceptions being negative utilitarianism and nihilism, but nihilism cares about nothing and very few people are negative utilitarian and most of those people seem to be EAs). So I see the longtermist section of EA more as the "interest group" in humanity that advocates for the future, as opposed to one that's going to determine what will and won't happen in the future. I agree that if we were going to determine the entire future of humanity, we would want to be way more diverse than we are now. But if we're more like an interest group, efficiency seems good.

On the concrete level -- you mention not being happy about these things:

EAs give high credence to non-expert investigations written by their peers

Agreed this happens and is bad

they rarely publish in peer-review journals and become increasingly dismissive of academia

Idk, academia doesn't care about the things we care about, and as a result it is hard to publish there. It seems like long-term we want to make a branch of academia that cares about what we care about, but before that it seems pretty bad to subject yourself to peer reviews that argue that your work is useless because they don't care about the future, and/or to rewrite your paper so that regular academics understand it whereas other EAs who actually care about it don't. (I think this is the situation of AI safety.)

show an increasingly certain and judgmental stance towards projects they deem ineffective

Agreed this happens and is bad (though you should get more certain as you get more evidence, so maybe I think it's less bad than you do)

defer to EA leaders as epistemic superiors without verifying the leaders epistemic superiority

Agreed this happens and is bad

trust that secret google documents which are circulated between leaders contain the information that justifies EA’s priorities and talent allocation

Agreed this would be bad if it happened, I'm not actually sure that people trust this? I do hear comments like "maybe it was in one of those secret google docs" but I wouldn't really say that those people trust that process.

let central institutions recommend where to donate and follow advice to donate to central EA organisations

Kinda bad, but I think this is more a fact about "regular" EAs not wanting to think about where to donate? (Or maybe they have more trust in central institutions than they "should".)

let individuals move from a donating institution to a recipient institution and visa versa

Seems really hard to prevent this -- my understanding is it happens in all fields, because expertise is rare and in high demand. I agree that it's a bad thing, but it seems worse to ban it.

strategically channel EAs into the US government

I don't see why this is bad. I think it might be bad if other interest groups didn't do this, but they do. (Though I might just be totally wrong about that.)

adjust probability assessments of extreme events to include extreme predictions because they were predictions by other members

That seems somewhat bad but not obviously so? Like, it seems like you want to predict an average of people's opinions weighted by expertise; since EA cares a lot more about x-risk it often is the case that EAs are the experts on extreme events.

AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher

My experience matches Ben's more than yours.

My impression is that there hasn't so much been a shift in views within individual people than the influx of a younger generation who tends to have an ML background and roughly speaking tends to agree more with Paul Christiano than MIRI. Some of them are now somewhat prominent themselves (e.g. Rohin Shah, Adam Gleave, you), and so the distribution of views among the set of perceived "AI risk thought leaders" has changed.

All of the people you named didn't have an ML background. Adam and I have CS backgrounds (before we joined CHAI, I was a PhD student in programming languages, while Adam worked in distributed systems iirc). Ben is in international relations. If you were counting Paul, he did a CS theory PhD. I suspect all of us chose the "ML track" because we disagreed with MIRI's approach and thought that the "ML track" would be more impactful.

(I make a point out of this because I sometimes hear "well if you started out liking math then you join MIRI and if you started out liking ML you join CHAI / OpenAI / DeepMind and that explains the disagreement" and I think that's not true.)

I don't recall anyone seriously suggesting there might not be enough time to finish a PhD before AGI appears.

I've heard this (might be a Bay Area vs. Europe thing).

Antitrust-Compliant AI Industry Self-Regulation

Planned summary for the Alignment Newsletter:

One way to reduce the risk of unsafe AI systems is to have agreements between corporations that promote risk reduction measures. However, such agreements may run afoul of antitrust laws. This paper suggests that this sort of self-regulation could be done under the “Rule of Reason”, in which a learned profession (such as “AI engineering”) may self-regulate in order to correct a market failure, as long as the effects of such a regulation promote rather than harm competition.

In the case of AI, if AI engineers self-regulate, this could be argued as correcting the information asymmetry between the AI engineers (who know about risks) and the users of the AI system (who don’t). In addition, since AI engineers arguably do not have a monetary incentive, the self-regulation need not be anticompetitive. Thus, this seems like a plausible method by which AI self-regulation could occur without running afoul of antitrust law, and so is worthy of more investigation.
Some promising career ideas beyond 80,000 Hours' priority paths
I would have thought that it would sometimes be important for making safe and beneficial AI to be able to prove that systems actually exhibit certain properties when implemented.

We can decompose this into two parts:

1. Proving that the system that we design has certain properties

2. Proving that the system that we implement matches the design (and so has the same properties)

1 is usually done by math-style proofs, which are several orders of magnitude easier to do than direct formal verification of the system in a proof assistant without having first done the math-style proof.

2 is done by formal verification, where for complex enough systems the specification for the formal verification often comes from the output of a math proof.

I guess I think this first becuase bugs seem capable of being big deals in this context

I'm arguing that after you've done 1, even if there's a failure from not having done 2, it's very unlikely to cause x-risk via the usual mechanism of an AI system adversarially optimizing against humans. (Maybe it causes x-risk in that due to a bug the computer system says "call Russia" and that gets translated to "launch all the nukes", or something like that, but that's not specific to AI alignment, and I think it's pretty unlikely.)

Like, idk. I struggle to actually think of a bug in implementation that would lead to a powerful AI system optimizing against us, when without that bug it would have been fine. Even if you accidentally put a negative sign on a reward function, I expect that this would be caught long before the AI system was a threat.

I realize this isn't a super compelling response, but it's hard to argue against this because it's hard to prove a negative.

there could be some instances where it's more feasible to use proof assistants than math to prove that a system has a property.

Proof assistants are based on math. Any time a proof assistant proves something, it can produce a "transcript" that is a formal math proof of that thing.

Now you might hope that proof assistants can do things faster than humans, because they're automated. This isn't true -- usually the automation is things like "please just prove for me that 2*x is larger than x, I don't want to have to write the details myself", or "please fill out and prove the base case of this induction argument", where a standard math proof wouldn't even note the detail.

Sometimes a proof assistant can do better than humans, when the proof of a fact is small but deeply unintuitive, such that brute force search is actually better than finetuned human intuition. I know of one such case, that I'm failing to find a link for. But this is by far the exception, not the rule.

(There are some proofs, most famously the map-coloring theorem, where part of the proof was done by a special-purpose computer program searching over a space of possibilities. I'm not counting these, as this feels like mathematicians doing a math proof and finding a subpart that they delegated to a machine.)

EDIT: I should note that one use case that seems plausible to me is to use formal verification techniques to verify learned specifications, or specifications that change based on the weights of some neural net, but I'd be pretty surprised if this was done using proof assistants (as opposed to other techniques in formal verification).

Some promising career ideas beyond 80,000 Hours' priority paths
For example, it might be possible to use proof assistants to help solve the AI ‘alignment problem’ by creating AI systems that we can prove have certain properties we think are required for the AI system to reliably do what we want it to do.

I don't think this is particularly impactful, primarily because I don't see a path by which it has an impact, and I haven't seen anyone make a good case for this particular path to impact.

(It's hard to argue a negative, but if I had to try, I'd point out that if we want proofs, we would probably do those via math, which works at a much higher level of abstraction and so takes much less work / effort; formal verification seems good for catching bugs in your implementations of ideas, which is not the core of the AI risk problem.)

However, it is plausibly still worthwhile becoming an expert on formal verification because of the potential applications to cybersecurity. (Though it seems like in that case you should just become an expert on cybersecurity.)

Critical Review of 'The Precipice': A Reassessment of the Risks of AI and Pandemics
This suggests that we should schedule a time to talk in person, and/or an adversarial collaboration trying to write a version of the argument that you're thinking of.

Sounds good, I'll just clarify my position in this response, rather than arguing against your claims.

So then I guess your response is something like "But everyone forgetting to eat food is a crazy scenario, whereas the naive extrapolation of the thing we're currently doing is the default scenario".

It's more like "there isn't any intellectual work to be done / field building to do / actors to coordinate to get everyone to eat".

Whereas in the AI case, I don't know how we're going to fix the problem I outlined; and as far as I can tell nor does anyone else in the AI community, and therefore there is intellectual work to be done.

We are already at significantly-better-than-human optimisation

Sorry, by optimization there I meant something more like "intelligence". I don't really care whether it comes from better SGD, some hardcoded planning algorithm, or a mesa optimizer; the question is whether it is significantly more capable than humans at pursuing goals.

I thought our opinions were much more similar.

I think our predictions of how the world will go concretely are similar; but I'd guess that I'm happier with abstract arguments that depend on fuzzy intuitive concepts than you are, and find them more compelling than more concrete ones that depend on a lot of specific details.

Load More