Hi, all! The Machine Intelligence Research Institute (MIRI) is answering questions here tomorrow, October 12 at 10am PDT. You can post questions below in the interim.
MIRI is a Berkeley-based research nonprofit that does basic research on key technical questions related to smarter-than-human artificial intelligence systems. Our research is largely aimed at developing a deeper and more formal understanding of such systems and their safety requirements, so that the research community is better-positioned to design systems that can be aligned with our interests. See here for more background.
Through the end of October, we're running our 2016 fundraiser — our most ambitious funding drive to date. Part of the goal of this AMA is to address questions about our future plans and funding gap, but we're also hoping to get very general questions about AI risk, very specialized questions about our technical work, and everything in between. Some of the biggest news at MIRI since Nate's AMA here last year:
- We developed a new framework for thinking about deductively limited reasoning, logical induction.
- Half of our research team started work on a new machine learning research agenda, distinct from our agent foundations agenda.
- We received a review and a $500k grant from the Open Philanthropy Project.
Likely participants in the AMA include:
- Nate Soares, Executive Director and primary author of the AF research agenda
- Malo Bourgon, Chief Operating Officer
- Rob Bensinger, Research Communications Manager
- Jessica Taylor, Research Fellow and primary author of the ML research agenda
- Tsvi Benson-Tilsen, Research Associate
Nate, Jessica, and Tsvi are also three of the co-authors of the "Logical Induction" paper.
EDIT (10:04am PDT): We're here! Answers on the way!
EDIT (10:55pm PDT): Thanks for all the great questions! That's all for now, though we'll post a few more answers tomorrow to things we didn't get to. If you'd like to support our AI safety work, our fundraiser will be continuing through the end of October.
Scott Garrabrant’s logical induction framework feels to me like a large step forward. It provides a model of “good reasoning” about logical facts using bounded computational resources, and that model is already producing preliminary insights into decision theory. In particular, we can now write down models of agents that use logical inductors to model the world---and in some cases these agents learn to have sane beliefs about their own actions, other agents’ actions, and how those actions affect the world. This, despite the usual obstacles to self-modeling.
Further, the self-trust result from the paper can be interpreted to say that a logical inductor believes something like “If my future self is confident in the proposition A, then A is probably true”. This seems like one of the insights that the PPRHOL work was aiming at, namely, writing down a computable reasoning system that asserts a formal reflection principle of itself. Such a reflection principle must be weaker than full logical soundness; a system that proved “If my future self proves A, then A is true” would be inconsistent. But as it turns out, the reflection principle is feasible if you replace “proves” with “assigns high probability to” and replace “true” with “probably true”.
It is an active area of research to understand logical induction more deeply, and to apply it to decision-theoretic problems that require reflective properties. For example, the current framework uses “traders” that express their “beliefs” in terms of strategies for making trades against the market prices (probabilities) output by a logical inductor. Then traders are rewarded based on buying shares in sentences that turn out to be highly valued by later market prices. It would be nice to understand this process in Bayesian terms, e.g., where traders are hypotheses that output predictions about the market and have their probabilities updated by Bayesian updating.