I’ve been thinking about my annual donation, and I’ve decided to donate to MIRI this year. I haven’t previously donated to MIRI, and my reasons for doing so now are somewhat nuanced, so I thought they were worth explaining.

I have previously thought that MIRI was taking a somewhat less-than-ideal approach on AI safety, and they were not my preferred donation target. Three things have changed:

  1. My opinion of the approach has changed a little (actually not much);
  2. I think they are moving towards a better version of their approach (more emphasis on good explanations of their work);
  3. The background distribution of work and opportunities in AI safety has shifted significantly.

Overall I do not fully endorse MIRI’s work. I somewhat agree with the perspective of the Open Philanthropy Project’s review, although I am generally more positive towards MIRI’s work:

  • I agree with those of their technical advisors who thought it could be beneficial for potential risks from advanced AI to solve the problems on the research agenda, rather than those who did not.
  • I thought that the assessment of the level of progress was quite unfair.
    • A key summary sentence, “One way of summarizing our impression of this conversation is that the total reviewed output is comparable to the output that might be expected of an intelligent but unsupervised graduate student over the course of 1-3 years.”, in particular, seems bemusingly unfair:
      • I think I see significant importance in the work of giving technical framings of the problems in the first place. In some cases once this is done the solutions are not technically hard; I wonder if the OpenPhil review was concerned relatively more with the work on solutions.
        • e.g. I am impressed by giving a theoretical foundation for classical game theory, and I can see various ways this could be useful. Note that this paper wasn’t included in the OpenPhil review (perhaps because one of the authors was one of the OpenPhil technical advisors).
      • The point about supervision felt potentially a bit confused. I think it’s significantly easier to make quick progress when fleshing out the details of an established field than when trying to give a good grounding for a new field
        • On the other hand, I do think the clarity of writing in some of MIRI’s outputs has not been great, and that this is potentially something supervision could have helped with. I think they’ve been improving on this.
      • My reference class is PhD students in mathematics at Oxford, a group I’m familiar with. I find it plausible that this would line up with the output of some of the most talented such students, but I thought the wording implied comparison to a significantly lower bar than this.
      • (Edit: see also this useful discussion with Jacob Steinhardt in the comment thread)

These views are based on several conversations with MIRI researchers over approximately the last three years, and reading a fraction of their published output.

Two or three years ago, I thought that it was important that AI safety engage significantly more with mainstream AI research, and build towards having an academic field which attracted the interest of many researchers. It seemed that MIRI’s work was quite far from optimised for doing that. I thought that the abstract work MIRI was doing might be important eventually, but that it was less time-critical than field-building.

Now, the work to build a field which ties into existing AI research is happening, and is scaling up quite quickly. Examples:

I expect this trend to continue for at least a year or two. Moreover I think this work is significantly talent-constrained (and capacity-constrained) rather than funding-constrained. In contrast, MIRI has been developing a talent pipeline and recently failed to reach its funding target, so marginal funds are likely to have a significant effect on actual work done over the coming year. I think that this funding consideration represents a significant-but-not-overwhelming point in favour of MIRI over other technical AI safety work (perhaps a factor of between 5 and 20 if considering allocating money compared to allocating labour, but I’m pretty uncertain about this number).

A few years ago, I was not convinced that MIRI’s research agenda was what would be needed to solve AI safety. Today, I remain not convinced. However, I’m not convinced by any agenda. I think we should pursuing a portfolio of different research agendas, focusing in each case on not optimising for technical results in the short term, but optimising for a solid foundation that we can build a field on and attract future talent to. As MIRI’s work looks to be occupying a much smaller slice of the total work going forwards than it has historically, adding resources to this part of the portfolio looks relatively more valuable than before. Moreover MIRI has become significantly better at clear communication of its agenda and work -- which I think is crucial for this objective of building a solid foundation -- and I know they are interested in continuing to improve on this dimension.

The combination of these factors, along with the traditional case for the importance of AI safety as a field, makes me believe that MIRI may well be the best marginal use of money today.

Ways I think this might be a mistake:

  • Opportunity cost of money
    • I’m fairly happy preferring funding MIRI to any other direct technical work in AI safety I know of.
      • There might be other opportunities I am unaware of. For example I would like more people to work on Paul Christiano’s agenda. I don’t know a way to fund that directly (though I know some MIRI staff were looking at working on it a few months ago).
    • It seems plausible that money could be better spent by 80,000 Hours or CFAR in helping to develop a broader pipeline of talent for the field. However, I think that a significant bottleneck is the development of really solid agendas, and I think MIRI may be well-placed to do this.
    • Given the recent influx of money, another field than AI safety might be the best marginal use of resources. I personally think that prioritisation research is extremely important, and would consider donating to the Centre for Effective Altruism to support this instead of AI safety.
  • Opportunity cost of researchers’ time
    • Perhaps MIRI will employ researchers to work on a suboptimal agenda, and they would otherwise get jobs working on a more important part of AI safety (if those other parts are indeed talent constrained).
    • However, I think that the background of MIRI researchers is often not the same as would be needed for work on (say) more machine-learning oriented research agendas.
  • Failing to shift MIRI’s focus
    • If MIRI were doing work that was useful but suboptimal, one might think that failure to reach funding targets could get them to re-evaluate. However:
      • I think they are already shifting their focus in a direction I endorse.
      • Withholding funding is a fairly non-cooperative way to try to achieve this. I’d prefer to give funding, and simply tell them my concerns.

Extra miscellaneous factors in favour of MIRI:

  • I should have some epistemic humility
    • I’ve had a number of conversations with MIRI researchers about the direction of their research, in moderate depth. I follow and agree with some of the things they are saying. In other cases, I don’t follow the full force of the intuitions driving their choices.
      • The fact that they failed to explain it to me so that I could fully follow decreases my credence that what they have in mind is both natural and correct (relative to before they tried this), since I think it tends to be easier to find good explanations for natural and correct things.
        • This would be a stronger update for me, except that I’ve also had the experience of people at MIRI repeatedly failing to convey something to me, and then succeeding over a year later. A clean case of this is that I previously believed decision theory was pretty irrelevant for AI safety, and I now see mechanisms for it to matter. This is good evidence that at least in some cases they have access to intuitions which are correct about something important, even when they’re unable to clearly communicate them.
      • In these conversations I’ve also been able to assess their epistemics and general approach.
        • I don’t fully endorse these, but they seem somewhat reasonable. I also think some of my differences arise from differences in communication style.
        • Some general trust in their epistemics leads me to have some belief that there are genuinely useful insights that they are pursuing, even when they aren’t yet able to clearly communicate them.
    • (Edit: see also this discussion with Anna Salamon in the comment thread.)
  • Training and community building
    • I think MIRI has a culture which encourages some useful perspectives on AI safety (I’m roughly pointing towards what they describe as “security mindset”).
      • I’m less convinced than they that this mindset is particularly crucial, relative to, e.g. an engineering mindset, but I do think there is a risk of it being under-represented in a much larger AI safety community.
    • I think that one of the more effective ways to encourage deep sharing of culture and perspective between research groups is exchange of staff.
    • If MIRI has more staff in the short term, this will allow greater dispersal of this perspective in the next few years.
  • Money for explicitly long-term work will tend to be neglected
    • As AI systems become more powerful over the coming decades, there will be increasing short-term demand for AI safety work. I think that in many cases high-quality work producing robust solutions to short-term problems could be helpful for some of the longer-term problems. However there will be lots of short-term incentives to focus on short-term problems, or even long-term problems with short-term analogues. This means that altruistic money may have more leverage over the long-term scenarios.

Overall, I don’t think we understand the challenges to come well enough that we should commit to certain approaches yet. I think MIRI has some perspectives that I’d like to see explored and explained further, I think they’re moving in a good direction, and I’m excited to see what they’ll manage in the next couple of years.

Disclaimers: These represent my personal views, not those of my employers. Several MIRI staff are known personally to me.

[*] There are actually some tax advantages to my donating to CEA by requesting a lower salary. This previously swayed me to donate to CEA, but I think I actually care more about the possible bias. However, if someone who was planning to donate CEA wants to do a donation switch with me, we could recover and split these benefits, probably worth a few hundred dollars. Please message or email me if interested.

Sorted by Click to highlight new comments since: Today at 11:28 AM

I donated to MIRI this year, too, and it is striking — given that you and I coming at the question from different backgrounds (i.e. with me as past MIRI executive) — how similar my reasons (this year) are to yours, including my reaction to Open Phil's write-up, my reservations, my perception of how field dynamics have changed, etc.

(Note: I work at Open Phil but wasn't involved in thinking through or deciding Open Phil's grant to MIRI. My opinions in this comment are, obviously, my own.)

Seeing this comment from you makes me feel good about Open Phil's internal culture; it seems like evidence that folks who work there feel free to think independently and to voice their thoughts even when they disagree. I hope we manage a culture that makes this sort of thing accessible at CFAR and in general.

Interestingly I don't think there is a big gap between my position (hence also Luke's?) and Open Phil's position.

I suspect it’s worth forming an explicit model of how much work “should” be understandable by what kinds of parties at what stage in scientific research.

To summarize my own take:

It seems to me that research moves down a pathway from (1) "totally inarticulate glimmer in the mind of a single researcher" to (2) "half-verbal intuition one can share with a few officemates, or others with very similar prejudices" to (3) "thingy that many in a field bother to read, and most find somewhat interesting, but that there's still no agreement about the value of" to (4) "clear, explicitly statable work whose value is universally recognized valuable within its field". (At each stage, a good chunk of work falls away as a mirage.)

In "The Structure of Scientific Revolutions", Thomas Kuhn argues that fields begin in a "preparadigm" state in which nobody's work gets past (3). (He gives a bunch of historical examples that seem to meet this pattern.)

Kuhn’s claim seems right to me, and AI Safety work seems to me to be in a "preparadigm" state in that there is no work past stage (3) now. (Paul's work is perhaps closest, but there is are still important unknowns / disagreement about foundations, whether it'll work out, etc.)

It seems to me one needs epistemic humility more in a preparadigm state, because, in such states, the correct perspective is in an important sense just not discovered yet. One has guesses, but the guesses cannot be established in common as established knowledge.

It also seems to me that the work of getting from (3) to (4) (or from 1 or 2 to 3, for that matter) is hard, that moving along this spectrum requires technical research (it basically is a core research activity), and one shouldn't be surprised if it sometimes takes years -- even in cases where the research is good. (This seems to me to also be true in e.g. math departments, but to be extra hard in preparadigm fields.)

(Disclaimer: I'm on the MIRI board, and I worked at MIRI from 2008-2012, but I'm speaking only for myself here.)

Relatedly, it seems to me that in general, preparadigm fields probably develop faster if:

  1. Different research approaches can compete freely for researchers (e.g., if researchers have secure, institution-independent funding, and can work on whatever approach pleases them). (The reason: there is a strong relationship between what problems can grab a researcher’s interest, and what problems may go somewhere. Also, researchers are exactly the people who have leisure to form a detailed view of the field and what may work. cf also the role of play in research progress.)

  2. The researchers themselves feel secure, and do not need to attempt to optimize for work for “what others will evaluate as useful enough to keep paying me”. (Since such evaluations are unreliable in pre paradigm fields, and since one wants to maximize the odds that the right approach is tried. This security may well increase the amount of non-productivity in the median case, but it should also increase the usefulness of the tails. And the tails are where most of the value is.)

  3. Different research approaches somehow do not need to compete for funding, PR, etc., except via researchers’ choices as to where to engage. There are no organized attempts to use social pressure or similar to override researchers’ intuitions as to where may be fruitful to engage (nor to override research institutions’ choice of what programs to enable, except via the researchers’ interests). (Funders’ intuitions seem less likely to be detailed than are the intuitions of the researcher-on-that-specific-problem; attempts to be clear/explainable/respectable are less likely to pull in good directions.)

  4. The pool of researchers includes varied good folks with intuitions formed in multiple fields (e.g., folks trained in physics; other folks trained in math; other folks trained in AI; some usually bright folks just out of undergrad with less-developed disciplinary prejudices), to reduce the odds of monoculture.

(Disclaimer: I'm on the MIRI board, and I worked at MIRI from 2008-2012, but I'm speaking only for myself here.)

I generally agree with both of these comments. I think they're valuable points which express more clearly than I did some of what I was getting at with wanting a variety of approaches and thinking I should have some epistemic humility.

One point where I think I disagree:

attempts to be clear/explainable/respectable are less likely to pull in good directions.

I don't want to defend pulls towards being respectable, and I'm not sure about pulls towards being explainable, but I think that attempts to be clear are extremely valuable and likely to improve work.I think that clarity is a useful thing to achieve, as it helps others to recognise the value in what you're doing and build on the ideas where appropriate (I imagine that you agree with this part).

I also think that putting a decent fraction of total effort into aiming for clarity is likely to improve research directions. This is based on research experience -- I think that putting work into trying to explain things very clearly is hard and often a bit aversive (because it can take you from an internal sense of "I understand all of this" to a realisation that actually you don't). But I also think it's useful for making progress purely internally, and that getting a crisper idea of the foundations can allow for better work building on this (or a realisation that this set of foundations isn't quite going to work).

Not sure how much this is a response to you, but:

In considering whether incentives toward clarity (e.g., via being able to explain one’s work to potential funders) are likely to pull in good or bad directions, I think it’s important to distinguish between two different motions that might be used as a researcher (or research institution) responds to those incentives.

  • Motion A: Taking the research they were already doing, and putting a decent fraction of effort into figuring out how to explain it, figuring out how to get it onto firm foundations, etc.

  • Motion B: Choosing which research to do by thinking about which things will be easy to explain clearly afterward.

It seems to me that “attempts to be clear” in the sense of Motion A are indeed likely to be helpful, and are worth putting a significant fraction of one’s effort into. I agree also that they can be aversive and that this aversiveness (all else equal) may tend to cause underinvestment in them.

Motion B, however, strikes me as more of a mixed bag. There is merit in choosing which research to do by thinking about what will be explainable to other researchers, such that other researchers can build on it. But there is also merit to sometimes attempting research on the things that feel most valuabe/tractable/central to a given researcher, without too much shame if it then takes years to get their research direction to be “clear”.

As a loose analogy, one might ask whether “incentives to not fail” have a good or bad effect on achievement. And it seems like a mixed bag. The good part (analogous to Motion A) is that, once one has chosen to devote hours/etc. to a project, it is good to try to get that project to succeed. The more mixed part (analogous to Motion B) is that “incentives to not fail” sometimes cause people to refrain from attempting ambitious projects at all. (Of course, it sometimes is worth not trying a particular project because its success-odds are too low — Motion B is not always wrong.)

I agree with all this. I read your original "attempts to be clear" as Motion A (which I was taking a stance in favour of), and your original "attempts to be exainable" as Motion B (which I wasn't sure about).

Gotcha. Your phrasing distinction makes sense; I'll adopt it. I agree now that I shouldn't have included "clarity" in my sentence about "attempts to be clear/explainable/respectable".

The thing that confused me is that it is hard to incentivize clarity but not the explainability; the easiest observable is just "does the person's research make sense to me?", which one can then choose how to interpret, and how to incentivize.

It's easy enough to invest in clarity / Motion A without investing in explainability / Motion B, though. My random personal guess is that MIRI invests about half of their total research effort into clarity (from what I see people doing around the office), but I'm not sure (and I could ask the researchers easily enough). Do you have a suspicion about whether MIRI over- or under-invests in Motion A?

My suspicion is that MIRI significantly underinvests/misinvests in Motion A, although of course this is a bit hard to assess from outside.

I think that they're not that good at clearly explaining their thoughts, but that this is a learnable (and to some extent teachable) skill, and I'm not sure their researchers have put significant effort into trying to learn it.

I suspect that they don't put enough time into trying to clearly explain the foundations for what they're doing, relative to trying to clearly explain their new results (though I'm less confident about this, because so much is unobserved).

I think they also sometimes indugle in a motion where they write to try to persuade the reader that what they're doing is the correct approach and helpful on the problem at hand, rather than trying to give the reader the best picture of the ways in which their work might or might not actually be applicable. I think at a first pass this is trying to substitute for Motion B, but it actively pushes against Motion A.

I'd like to see explanations which trend more towards:

  • Clearly separating out the motivation for the formalisation from the parts using the formalisation. Then these can be assessed separately. (I think they've got better at this recently.)
  • Putting their cards on the table and giving their true justification for different assumptions. In some cases this might be "slightly incoherent intuition". If that's what they have, that's what they should write. This would make it easier for other people to evaluate, and to work out which bits to dive in on and try to shore up.

I went to a MIRI workshop on decision theory last year. I came away with an understanding of a lot of points of how MIRI approaches these things that I'd have a very hard time writing up. In particular, at the end of the workshop I promised to write up the "Pi-maximising agent" idea and how it plays into MIRI's thinking. I can describe this at a party fairly easily, but I get completely lost trying to turn it into a writeup. I don't remember other things quite as well (eg "playing chicken with the Universe") but they have the same feel. An awful lot of what MIRI knows seems to me folklore like this.

This is interesting and interacts with my comment in reply to Anna on clarity of communication. I think I'd like to see them write up more such folklore as carefully as possible; I'm not optimistic about attempts to outsource such write-ups.

I agree that this makes sense in the "ideal" world, where potential donors have better mental models of this sort of research pathway, and have found this sort of thinking useful as a potential donor.

From an organizational perspective, I think MIRI should put more effort into producing visible explanations of their work (well, depending on their strategy to get funding). As worries about AI risk become more widely known, there will be a larger pool potential donations to research in the area. MIRI risks becoming out-competed by others who are better at explaining how their work decreases risk from advanced AI (I think this concern applies both to talent and money, but here I'm specifically talking about money).

High-touch, extremely large donors will probably get better explanations, reports on progress, etc from organizations, but the pool of potential $ from donors who just read what's available online may be very large, and very influenced by clear explanations about the work. This pool of donors is also more subject to network effects, cultural norms, and memes. Given that MIRI is running public fundraisers to close funding gaps, it seems that they do rely on these sorts of donors for essential funding. Ideally, they'd just have a bunch of unrestricted funding to keep them secure forever (including allaying the risk of potential geopolitical crises and macroeconomic downturns).

Thanks Owen! Agreed with your writeup. I would donate to MIRI myself this year, but I unfortunately don't really have spare cash right now :P

Could we re-open this discussion in view of MIRI's achievements over the course of a year?

A recent trend of providing relatively high research grants (relative to some of the most prestigious research grants across EU, such as for instance ERC starting grants ~ 1.5 mil EUR) to projects on AI risks and safety made me curious, and so I looked a bit more into this topic. What struck me as especially curious is the lack of transparency when it comes to the criteria used to evaluate the projects and to decide how to allocate the funds. Now, for the sake of this question, I am assuming that the research topic of AI risks and safety is important and should be funded (to which extent it actually is, is beside the point I'm writing here and deserves a discussion on its own; so let's just say it is among the most pursuit worthy problems in view of both epistemic and non-epistemic criteria).

Particularly surpising was a sudden grant of 3.75 mil USD by Open Philanropy Project (OPP) to MIRI (https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/machine-intelligence-research-institute-general-support-2017). Note that the funding is more than double the amount given to ERC starting grantees. Previously, OPP awarded MIRI with 500.000 USD and provided an extensive explanation of this decision (https://www.openphilanthropy.org/focus/global-catastrophic-risks/potential-risks-advanced-artificial-intelligence/machine-intelligence-research-institute-general-support). So, one would expect that for a grant more than 7 times higher, we'd find at least as much. But what we do find is an extremely brief explanation saying that an anonymous expert reviewer has evaluated MIRI's work as highly promising in view of their paper "Logical Induction".

Note that in the last 2 years since I first saw this paper online, the very same paper has not been published in any peer-reviewed journal. Moreover, if you check MIRI's publications (https://intelligence.org/all-publications/) you find not a single journal article since 2015 (or an article published in prestigous AI conference proceedings, for that matter). It suffices to say that I was surpised. So I decided to contact both MIRI asking if perhaps their publications haven't been updated on their website, and OPP asking for the evaluative criteria used when awarding this grant.

MIRI has never replied (email sent on February 8). OPP took a while to reply, and today I've received the following email:

"Hi Dunja,

Thanks for your patience. Our assessment of this grant was based largely on the expert reviewer's reasoning in reviewing MIRI's work. Unfortunately, we don't have permission to share the reviewer's identity or reasoning. I'm sorry not to be more helpful with this, and do wish you the best of luck with your research.


[name blinded in this public post]"

All this is very surprising given that OPP prides itself on transparency. As stated on their website (https://www.openphilanthropy.org/what-open-means-us):

"Creating detailed reports on the causes we’re investigating. Sharing notes from our information-gathering conversations. Publishing writeups and updates on a number of our grants, including our reasoning and reservations before making a grant, and any setbacks and challenges we encounter."

However, the main problem here is not a mere lack of transparency, but a lack of effective and efficient funding policy. The question, how to decide which projects to fund in order to achieve effective and efficient knowledge acquisition, has been researched within philosophy of science and science policy for decades now. Yet, these very basic criteria seem absent from cases such as the above mentioned one. Not only are the criteria used non-transparent, but an open call for various research groups to submit their projects, where the funding agency then decides (in view of an expert panel - rather than a single reviewer) which project is the most promising one, has never happened. The markers of reliability, over the course of research, are extremely important if we want to advance effective research. The panel of experts (rather than a single expert) is extremely important in assuring procedural objectivity of the given assessment.

Altogether, this is not just surprising, but disturbing. Perhaps the biggest danger is that this falls into the hands of press and ends up being an argument for the point that organizations close to effective altruism are not effective at all.

Thank you very much for the write-up of your reasoning!

After having just now (prior to reading this post) a two-hour conversation on this topic, I'm glad to see we agree on the main points.

The main thing I think you didn't say, is that not only is it important that a diversity of approaches are worked on (and funded), but also that as the field is rapidly growing, it could easily be the case that the current distribution of resources into research approaches will be set in place [That's not quite right, see Owen CB's comment below]. While the ML-based approach at OpenAI and DeepMind and so on are promising, I would be unhappy if they grew to be the only approach - I don't feel we're confident enough to know whether they're the right approach. As such, we have a comparative advantage relative to all future versions of ourselves in funding other approaches and shaping the overall research space.

I'm likely to donate to MIRI myself this year, but first I will check out other approaches being worked on. I believe that Paul Christiano is not funding constrained (but will ask), and will look through e.g. the list of FLI grantees, and check up on whose work appears promising and is funding constrained. In the case where I have sufficient time to review their work in detail, I may offer to purchase an impact certificate from the researcher(s) who I think did the best research with their grant money.

(including the bizarre-ness of OpenPhil's analysis of number-of-papers-written, which is not how one measures progress of fundamentals research.)

What in the grant write-up makes you think the focus was on number-of-papers-written? I was one of the reviewers and that was definitely not our process.

(Disclaimer: I'm a scientific advisor for OpenPhil, all opinions here are my own.)

Think this is at least partially my fault. I included a phrase "(in the metric of papers written, say)" when discussing progress in the above post, but I didn't really think this was the main metric you were judging things on. I'll edit that out.

The sense it which it felt "bemusingly unfair" was that the natural situation it brought to mind was taking a bright grad student, telling them to work on AI safety and giving them no more supervision, then waiting 1-3 years. In that scenario I'd be ecstatic to see something like what MIRI have done.

I don't actually think that's the claim that was intended either, though. I think the write-up was trying to measure something like the technical impressiveness of the theorems proved (of course I'm simplifying a bit). There is at least something reasonable in assessing this, in that it is common in academia, and I think is often a decent proxy for How good are the people doing this work?, particularly if they're optimising for that metric. In doing so it also provided some useful information to me, because I hadn't seriously tried to assess this.

However, it isn't the metric I actually care about. I'm interested in their theory-building rather than their theorem-proving. I wouldn't say I'm extremely impressed by them on that metric, but at least enough that when I interpreted the claim as being about theory-building, I felt it was quite unfair.

Very interested to know whether you think this is a fair perspective on what was actually being assessed.

I feel like I care a lot about theory-building, and at least some of the other internal and external reviewers care a lot about it as well. As an example, consider External Review #1 of Paper #3 (particularly the section starting "How significant do you feel these results are for that?"). Here are some snippets (link to document here):

The first paragraph suggests that this problem is motivated by the concern of assigning probabilities to computations. This can be viewed as an instance of the more general problems of (a) modeling a resource-bounded decision maker computing probabilities and (b) finding techniques to help a resource-bounded decision maker compute probabilities. I find both of these problems very interesting. But I think that the model here is not that useful for either of these problems. Here are some reasons why:

It’s not clear why the properties of uniform coherence are the “right” ones to focus on. Uniform coherence does imply that, for any fixed formula, the probability converges to some number, which is certainly a requirement that we would want. This is implied by the second property of uniform coherence. But that property considers not just constant sequences of formulas, but sequence where the nth formula implies the (n+1)st. Why do we care about such sequences? [...]

The issue of computational complexity is not discussed in the paper, but it is clearly highly relevant. [...]

Several more points are raised, followed by (emphasis mine):

I see no obvious modification of uniformly coherent schemes that would address these concerns. Even worse, despite the initial motivation, the authors do not seem to be thinking about these motivational issues.

For another example, see External Review #1 of Paper #4 (I'm avoiding commenting on internal reviews because I want to be sensitive to breaking anonymity).

On the website, it is promised that this paper makes a step towards figuring out how to come up with “logically non-omniscient reasoners”. [...]

This surely sounds impressive, but there is the question whether this is a correct interpretation of Theorem 5. In particular, one could imagine two cases: a) we are predicting a single type of computation, and b) we are predicting several types of computations. In case (a), why would the delays matter in asymptotic convergence in the first place? [...] In case (b), the setting that is studied is not a good abstraction: in this case there should be some “contextual information” available to the learner, otherwise the only way to distinguish between two types of computations will be based on temporal relation, which is a very limiting assumption here.

To end with some thoughts of my own: in general, when theory-building I think it is very important to consider both the relevance of the theoretical definitions to the original problem of interest, and the richness of what can actually be said. I don't think that definitions can be assessed independently of the theory that can be built from them. At the danger of self-promotion, I think that my own work here, which makes both definitional and theoretical contributions relevant to ML + security, does a good job of putting forth definitions and justifying them (by showing that we can get unexpectedly strong results in the setting considered, via a nice and fairly general algorithm, and that these results have unexpected and important implications for initially unrelated-seeming problems). I also claim that this work is relevant to AI safety but perhaps others will disagree.

Thanks for taking the time to highlight these. This is helpful, and shows that I hadn't quite done my homework in the above characterisation of the difference.

I agree then that the review was at least significantly concerned with theory-building. I had originally read this basket of concerns as more about clarity of communication (which I think is a big issue with MIRI's work), but I grant that there's actually quite a lot of overlap between the issues. See also my recent reply to Anna elsewhere in the comment thread.

I like the thoughts of your own at the end. I do think that the value of definitions depends on what you can build on them (although I'm not sure whether "richness" is the right characterisation -- it seems that sometimes the right definition makes the correct answer to a question you care about extremely clear, without necessarily any real sophistication in the middle).

I think that work of the type you link to is important, and roughly the type want the majority of work in the next decade to be (disclaimer: I haven't yet read it carefully). I am still interested in work which tries to build ahead and get us a better theory for systems which are in important ways more powerful than current systems. I think it's harder to ground this well (basically you're paying a big nearsightedness penalty), but there's time-criticality of doing it early if it's needed to inform swathes of later work.

Here's my current high-level take on the difference in our perspectives:

  • There is an ambiguity in whether MIRI's work is actually useful theory-building that they are just doing a poor job of communicating clearly, or whether it's not building something useful.
  • I tend towards giving them the benefit of the doubt / hedging that they are doing something valuable.
  • The Open Phil review takes a more sceptical position, that if they can't clearly express the value of the work, maybe there is not so much to it.

Also, I realized it might not be clear why I thought the quotes above are relevant to whether the reviews addressed the "theory-building" aspect. The point is it seems to me that the quoted parts of the reviews are directly engaging with whether the definitions make sense / the results are meaningful, which is a question about the adequacy of the theory for addressing the claimed questions, and not of its technical impressiveness. (I could imagine you don't feel this addresses what you meant by theory-building, but in that case you'll have to be more specific for me to understand what you have in mind.)

Thanks for pointing that out! I've been conflating your comments with other conversations I've had with people about MIRI, and have removed my sentence. I just read through the OpenPhil report carefully again.

I think that I disagree with OpenPhil's stated conclusions, but due to having looked at different papers (I had forgotten that the 'unsupervised grad student' comment referred just to the three papers submitted, and I'd mis-remembered exactly which papers they were). After conversations with a few early-stage researchers in other fields, I think that some of the other papers might be notably more impressive (e.g. the Grain-of-Truth paper accepted to UAI).

I understand the key example of MIRI's theory-building approach to be the extensive Logical Inductors paper, but haven't heard much feedback on the usefulness/impressiveness from non-MIRI researchers yet. I'd be quite interested to know if you have read it and updated up/down about MIRI as a result (as I'm considering donating to MIRI this year partly based on this).

After conversations with researchers whose opinions I respect, I've been lead to believe certain other papers are very impressive (e.g. the Grain-of-Truth paper accepted to UAI).

Could you say more about the credibility of these researchers' opinions? E.g. what fields are they in, how successful in their fields, how independent of MIRI?

Pretty independant of MIRI, early-stage researchers, other areas of theoretical CS (formal logic, game theory). I didn't mean for this to be strong evidence, have changed the wording.

I think it's quite unlikely that the current distribution will be set in place. And actually on purely current distributions I'm not sure the MIRI approach is underrepresented. On the other hand I think it's likely that the current distribution will influence future distribution, which is what's relevant; I'm trying to push back a little against an expected trend towards ML-based approaches representing a very large share of the work.

Yes, you're right about it not being 'set in place'. I more meant to say that, while funding and interest has grown significantly (OpenAI and DeepMind have in principle billions of dollars of spending power each and are now significantly interested in this topic), MIRI failed to reach it's $800k minimal fundraising target this year, and so I expect that the main approaches to AI that are being followed elsewhere will get the most attention in the future.

OpenAI and DeepMind have in principle billions of dollars of spending power each and are now significantly interested in this topic)

While I think there is a true point in this vicinity (it will be a lot easier to fund ML-based approaches, including at these organizations, but also others), this seems to be overstating the relevant resources and the effort going into safety topics. OpenAI has been funded with a billion dollars (although it might receive more funding later), and its annual spending must of course be lower. And both of these organizations have primary aims of advancing AI, with limited efforts on safety issues thus far.

MIRI seems like the most value-aligned and unconstrained of the orgs.

OpenAI also seems pretty unconstrained, but I have no idea what their perspective on Xrisk is, and all reports are that there is no master plan there.

Note that this is particularly an argument about money. I think that there are important reasons to skew work towards scenarios where AI comes particularly soon, but I think it’s easier to get leverage over that as a researcher choosing what to work on (for instance doing short-term safety work with longer-term implications firmly in view) than as a funder.

I didn't understand this part. Are you saying that funders can't choose whether to fund short-term or long-term work (either because they can't tell which is which, or there aren't enough options to choose from)?

I'm saying that the ratio you can advance the different agendas as a funder versus as a researcher skews towards advancing short-term stuff as a researcher, because it's less funding constrained (more talent constrained).