I'm a PhD student at the Center for Human-Compatible AI (CHAI) at UC Berkeley. I edit and publish the Alignment Newsletter, a weekly publication with recent content relevant to AI alignment. In the past, I ran the EA UC Berkeley and EA at the University of Washington groups.

Wiki Contributions


Case studies of self-governance to reduce technology risk

Planned summary for the Alignment Newsletter:

Should we expect AI companies to reduce risk through self-governance? This post investigates six historical cases, of which the two most successful were the Asilomar conference on recombinant DNA, and the actions of Leo Szilard and other physicists in 1939 (around the development of the atomic bomb). It is hard to make any confident conclusions, but the author identifies the following five factors that make self-governance more likely:

1. The risks are salient.
2. If self-governance doesn’t happen, then the government will step in with regulation (which is expected to be poorly designed).
3. The field is small, so that coordination is easier.
4. There is support from gatekeepers (e.g. academic journals).
5. There is support from credentialed scientists.

Can money buy happiness? A review of new data

Nice find, thanks!

(For others: note that the linked blog post also considers things like "maybe they just uploaded the wrong data" to be a plausible explanation.)

Getting started independently in AI Safety

 you can attempt a deep RL project, realise you are hopelessly out of your depth, then you know you'd better go through Spinning Up in Deep RL before you can continue. 

Tbc, I do generally like the idea of just in time learning. But:

  • You may not realize when you are hopelessly out of your depth ("doesn't everyone say that ML is an art where you just turn knobs until things work?" or "how was I supposed to know that the algorithm was going to silently clip my rewards, making all of my reward shaping useless?")
  • You may not know what you don't know. In the example I gave you probably very well know that you don't know RL, but you may not realize that you don't know the right tooling to use ("what, there's a Tensorboard dashboard I can use to visualize my training curves?")

Both of these are often avoided by taking courses that (try to) present the details to you in the right order.

Getting started independently in AI Safety

I think too many people feel held back from doing a project like thing on their own.

Absolutely. Also, too many people don't feel held back enough (e.g. maybe it really would have been beneficial to, say, go through Spinning Up in Deep RL before attempting a deep RL project). How do you tell which group you're in? 

(This comment inspired by Reversing Advice)

Can money buy happiness? A review of new data

If we change the y-axis to display a linear relationship, this tells a different story. In fact, we see a plateauing of the relationship between income and experience wellbeing, just as found in Kahneman and Deaton (2010), but just at a later point — about $200,000 per year.

Uhh... that shouldn't happen from just re-plotting the same data. In fact, how is it that in the original graph, there is an increase from $400,000 to $620,000, but in the new linear axis graph, there is a decrease?

A doubling of income is associated with about a 1-point increase on a 0–100 scale, which seems (to our eyes) surprisingly small.

In the context of the previous paragraph ("this doesn't mean go work at Goldman Sachs"), this seems to imply that rich people shouldn't get more money because it barely makes a difference, but this also applies to poor people as well, casting doubt on whether we should bother giving money away. I don't know if it was meant to imply the first point, but it gave me vibes of selectively interpreting the data to support a desired conclusion.

Ben Garfinkel's Shortform

I agree with this general point. I'm not sure if you think this is an interesting point to notice that's useful for building a world-model, and/or a reason to be skeptical of technical alignment work. I'd agree with the former but disagree with the latter.

High Impact Careers in Formal Verification: Artificial Intelligence

Planned summary for the Alignment Newsletter:

This post considers the applicability of formal verification techniques to AI alignment. Now in order to “verify” a property, you need a specification of that property against which to verify. The author considers three possibilities:

1. **Formally specifiable safety:** we can write down a specification for safe AI, _and_ we’ll be able to find a computational description or implementation

2. **Informally specifiable safety:** we can write down a specification for safe AI mathematically or philosophically, but we will not be able to produce a computational version

3. **Nonspecifiable safety:** we will never write down a specification for safe AI.

Formal verification techniques are applicable only to the first case. Unfortunately, it seems that no one expects the first case to hold in practice: even CHAI, with its mission of building provably beneficial AI systems, is talking about proofs in the informal specification case (which still includes math), on the basis of comments like [these](

) in Human Compatible. In addition, it currently seems particularly hard for experts in formal verification to impact actual practice, and there doesn’t seem to be much reason to expect that to change. As a result, the author is relatively pessimistic about formal verification as a route to reduce existential risk from failures of AI alignment.

Progress studies vs. longtermist EA: some differences

I just think there's a much greater chance that we look back on it and realize, too late, that we were focused on entirely the wrong things.

If you mean like 10x greater chance, I think that's plausible (though larger than I would say). If you mean 1000x greater chance, that doesn't seem defensible.

In both fields you basically ~can't experiment with the actual thing you care about (you can't just build a superintelligent AI and check whether it is aligned; you mostly can't run an intervention on the entire world  and check whether world GDP went up). You instead have to rely on proxies.

In some ways it is a lot easier to run proxy experiments for AI alignment -- you can train AI systems right now, and run actual proposals in code on those systems, and see what they do; this usually takes somewhere between hours and weeks. It seems a lot harder to do this for "improving GDP growth" (though perhaps there are techniques I don't know about).

I agree that PS has an advantage with historical data (though I don't see why economic theory is particularly better than AI theory), and this is a pretty major difference. Still, I don't think it goes from "good chance of making a difference" to "basically zero chance of making a difference".

The latter is something almost completely in the future, where we don't get any chances to get it wrong and course-correct.

Fwiw, I think AI alignment is relevant to current AI systems with which we have experience even if the catastrophic versions are in the future, and we do get chances to get it wrong and course-correct, but we can set that aside for now, since I'd probably still disagree even if I changed my mind on that. (Like, it is hard to do armchair theory without experimental data, but it's not so hard that you should conclude that you're completely doomed and there's no point in trying.)

Help me find the crux between EA/XR and Progress Studies

I've been perceiving a lot of EA/XR folks to be in (3) but maybe you're saying they're more in (2)?


Maybe it turns out that most folks in each community are between (1) and (2) toward the other. That is, we're just disagreeing on relative priority and neglectedness.

That's what I would say.

I can't see it as literally the only thing worth spending any marginal resources on (which is where some XR folks have landed).

If you have opportunity A where you get a benefit of 200 per $ invested, and opportunity B where you get a benefit of 50 per $ invested, you want to invest in A as much as possible, until the opportunity dries up. At a civilizational scale, opportunities dry up quickly (i.e. with millions, maybe billions of dollars), so you see lots of diversity. At EA scales, this is less true.

So I do agree that some XR folks (myself included) would, if given a pot of millions of dollars to distribute, allocate it all to XR; I don't think the same people would do it for e.g. trillions of dollars. (I don't know where in the middle it changes.)

I think Open Phil, at the billions of dollars range, does in fact invest in lots of opportunities, some of which are arguably about improving progress. (Though note that they are not "fully" XR-focused, see e.g. Worldview Diversification.)

Load More