240 karmaJoined


I began my PhD with a focus on Bayesian deep learning with exactly the same reasoning as you. I also share your doubts about the relevance of BDL to long-term safety. I have two clusters of thoughts: some reasons why BDL might be worth pursuing regardless, and alternative approaches.

Considerations about BDL and important safety research:

  • Don't overfit to recent trends. LLMs are very remarkable. Before them, DRL was very remarkable. I don't know what will be remarkable next. My hunch is that we won't get AGI by just doing more of what we are doing now. (People I respect disagree with that, and I am uncertain. Also, note I don't say we could't get AGI that way.)
  • Bayesian inference is powerful and general. The original motivation is still real. It is tempered by your (in my view, correct) observation that existing methods for approximate inference have big flaws. My view is that probability still describes the correct way to update given evidence and so it contains deep truths about reliable information processing. That means that understanding approximate Bayesian inference is still a useful guide for anyone trying to automatically process information correctly (and being aware of the necessary assumptions). And an awful lot of failure modes for AGI involve dangerous mistaken generalization. Also note that statements like "simple non-Bayesian techniques such as ensembles" are controversial, and there's considerable debate about whether ensembles are working because they perform approximate integration. Andrew Gordon Wilson has written a lot about this, and I tentatively agree with much of it.
  • Your PhD is not your career. As Mark points out, a PhD is just the first step. You'll learn how to do research. You really won't start getting that good at it until a few years in, by which point you'll write up the thesis and start working on something different. You're not even supposed to just keep doing your thesis as you continue your research. The main thing is to have a great research role model, and I think Phillip is quite good (by reputation, I don't know him personally).
  • BDL teaches valuable skills. Honestly, I just think statistics is super important for understanding modern deep learning, and it gives you a valuable lens to reason about why things are working. There are other specialisms that can develop valuable skills. But I'd be nervous about trading the opportunity to develop deep familiarity with the stats for practical experience on current SoTA systems (because stats will stay true and important, but SoTA won't stay SoTA). (People I respect disagree with that, and I am uncertain.)

Big picture, I think intellectual diversity among AGI safety researchers is good, Bayesian inference is important and fundamental, and lots of people glom on to whatever the latest hot thing is (currently LLMs), leading to rapid saturation.

So what is interesting to work on? I'm currently thinking about two main things:

  • I don't think that exact alignment is possible, in ways that are similar to how exact Bayesian inference is generally possible. So I'm working on trying to learn from the ways in which approximate inference is well/poorly defined to get insights for how alignment can be well/poorly defined and approximated. (Here I agree 100% with Mark that most of what is hard in AGI safety remains framing the problem correctly.)
  • I think a huge problem for AGI-esque systems is about to be hunting for dangerous failures. There's a lot of BDL work on 'actively' finding informative data, but mostly for small-data in low-dimensions. I'm much more interested in huge data, high-dimensions, which creates whole new problems (e.g., you can't just compute a score function for each possible datapoint). (Note that this is almost exactly the opposite to Mark's point below! But I don't exactly disagree with him, it's just that lots of things are worth trying.)

There are other things that are important, and I agree that OOD detection is also important (and I'm working on a conceptual paper on this, rather than a detection method specifically). If you'd like to speak about any of this stuff I'm happy to talk. You can reach me at sebastian.farquhar@cs.ox.ac.uk

This is a really good point, and I'm not sure that something exists which was written with that in mind. Daniel Dewey wrote something which was maybe a first step on a short form of this in 2015. A 'concrete-problems' in strategy might be a really useful output from SAIRC.


Often (in EA in particular) the largest cost to a failed started project isn't to you, but is a hard-to-see counterfactual impact.

Imagine I believe that building a synth bio safety field is incredibly important. Without a real background in synth bio, I go about building the field but because I lack context and subtle field knowledge, I screw it up having reached out to almost all the key players. They would now are be conditioned to think that synth bio safety is something that is pursued by naive outsiders who don't understand synth bio. This makes it harder for future efforts to proceed. It makes it harder for them to raise funds. It makes it harder for them to build a team.

The worst case is that you start a project, fail, but don't quit. This can block the space, and stop better projects from entering it.

These can be worked around, but it seems that many of your assumptions are conditional on not having these sorts of large negative counterfactual impacts. While that may work out, it seems overconfident to assume a 0% chance of this, especially if the career capital building steps are actually relevant domain knowledge building.

Prediction markets benefit a lot from liquidity. Making it EA specific doesn't seem to gain all that much. But EAs should definitely practice forecasting formally and getting rewarded for reliable predictions.

I'm not very confident on this estimate, but I'd hazard that between 5-50 causally connected groups will have made a recommendation related to the balance of research vs direct work in global health as part of the DfID budget (in either direction).

That's maybe a 75% confidence interval.

Yes this is absolutely not a thing that just GPP did - which is why I tried to call out in this post that several other groups were important to recommending it! (And also something I emphasised in the facebook post you link to.)

I don't know how many groups fed into the overall process and I'm sure there were big parts of the process I have no knowledge about. I know of two other quite significant entities that have publicly made very similar recommendations (Angus Deaton and the Centre for Global Development) as well as about half a dozen other entities that made similar but slightly narrower suggestions (many of which we cited). The general development aid sector is clearly enormous, but the field of people proposing this sort of thing is smaller.

Assigning causal credit for policy outcomes is very complicated. It obviously matters to us to assess it, so that we can tell if it's worth doing more work in an area. What we do is just talk to the people we made recommendations to and ask them how significant a role our recommendation played. Usually people prefer we don't share their reflections further, which is unfortunate but inevitable.

At the moment most of the orgs within CEA target 12 months reserves (though some have less and, in particular, they sometimes fall quite low at some point in the course of the year because we avoid on-going fundraising).

If we had something like 3 months of reserves for all costs unrestricted it would give us either greater financial security or the ability to cut the size of restricted overall reserves to, say, 7 months while keeping similar stability. This would free up EA capital for other projects.

It's a little unclear what the right level of reserves ought to be. In the US it's common for charities to have very large endowments (say 20 years). I think the 12 months at all times target we have right now is about appropriate, given the value of capital to EA projects, but would expect that number to drift upwards as the EA community matures.

You're quite right, they are different. At the moment, we are planning to use marginal unrestricted funds to invest in shared services. Partly this aims to increase the autonomy of the shared services function and reduce the extent they feel they need to ask for permission to all the orgs to do useful things.

Past that level though, unrestricted funding would help us build a small reserve of unrestricted money that would provide us with financial stability. Right now, each organisation needs to keep a pretty significant independent runway because virtually all our reserves are restricted. If we had a bigger pool of funds that could go to any org, we could get the same level of financial security with smaller total reserves.

GPP's total budget for 2016 will be roughly £220,000 which is roughly what our minimum target is. The reason there's a discrepancy between this figure and the £95k figure is that the £95k figure presented in the overall CEA budget includes only sums that flow through CEA and doesn't include any shared services. However, GPP is a joint project with FHI, so in 2016 a significant portion of the total costs will be funded via FHI rather than CEA. In addition, we are expecting to hire a seconded civil servant whose salary will be partly funded by the state. This is not counted as part of the CEA budget but is counted as part of the GPP budget.

You can find lots more detail on GPP here

Load more