Do you think EA's self-reflection about this is at all productive, considering most people had even less information than you?
I don't have terribly organized thoughts about this. (And I am still not paying all that much attention—I have much more patience for picking apart my own reasoning processes looking for ways to improve them, than I have for reading other people's raw takes :-p)
But here's some unorganized and half-baked notes:
I appreciated various expressions of emotion. Especially when they came labeled as such.
I think there was also a bunch of oth...
FWIW, I would totally want to openly do a postmortem. once the bankruptcy case is over, i'll be pretty happy to publicly say what i knew at various points of time. but i'm currently holding back for legal reasons, and instead discuss it (as you said) "behind closed doors". (Which is frustrating for everyone who would like to have transparent public discussion, sorry about that. it' is also really frustrating for me!)
I think the truth is closest to "we had a bunch of hints that we failed to assemble"
Fwiw, for common knowledge (though I don't know everything happening at CEA), so that other people can pick up the slack and not assume things are covered, or so that people can push me to change my prioritization, here's what I see happening at CEA in regard to:
"finding some other way to successfully heed the 10, which requires distinguishing them from the background noise--and distinguishing them as something actionable--before it's too late, and then routing the requisite action to the people who can do something about it"
Good point! Currently, I think the "pry more" lesson is supposed to account for a bunch of this.
Since making this update, I have in fact pried more into friends' lives. In at least one instance I found some stuff that worried me, at which point I was naturally like "hey, this worries me; it pattern-matches to some bad situations I've seen; I feel wary and protective; I request an opportunity to share and/or put you in touch with people who've been through putatively-analogous situations (though I can also stfu if you're sick of hearing people's triggered t...
Can you give an example or two of failure modes or "categories of failure modes that are easy to foresee" that you think are addressed by some HRAD topic? I'd thought previously that thinking in terms of failure modes wasn't a good way to understand HRAD research.
I want to steer clear of language that might make it sound like we’re saying:
X 'We can't make broad-strokes predictions about likely ways that AGI could go wrong.'
X 'To the extent we can make such predictions, they aren't important for informing research directions.'
X 'The best
Thanks for this solid summary of your views, Daniel. For others’ benefit: MIRI and Open Philanthropy Project staff are in ongoing discussion about various points in this document, among other topics. Hopefully some portion of those conversations will be made public at a later date. In the meantime, a few quick public responses to some of the points above:
...2) If we fundamentally "don't know what we're doing" because we don't have a satisfying description of how an AI system should reason and make decisions, then we will probably make lots of mist
Under whatever constraints Open Phil provided, I'd have sent the 'best by academic lights' papers I had.
We originally sent Nick Beckstead what we considered our four most important 2015 results, at his request; these were (1) the incompatibility of the "Inductive Coherence" framework and the "Asymptotic Convergence in Online Learning with Unbounded Delays" framework; (2) the demonstration in "Proof-Producing Reflection for HOL" that a non-pathological form of self-referential reasoning is possible in a certain class of theo...
Thanks for the response, Gregory. I was hoping to see more questions along these lines in the AMA, so I'm glad you followed up.
Open Phil's grant write-up is definitely quite critical, and not an endorsement. One of Open Phil's main criticisms of MIRI is that they don't think our agent foundations agenda is likely to be useful for AI alignment; but their reasoning behind this is complicated, and neither Open Phil nor MIRI has had time yet to write up our thoughts in any detail. I suggest pinging me to say more about this once MIRI and Open Phil have put up ...
Posts or comments on personal Twitter accounts, Facebook walls, etc. should not be assumed to represent any official or consensus MIRI position, unless noted otherwise. I'll echo Rob's comment here that "a good safety approach should be robust to the fact that the designers don’t have all the answers". If an AI project hinges on the research team being completely free from epistemic shortcomings and moral failings, then the project is doomed (and should change how it's doing alignment research).
I suspect we're on the same page about it being impo...
In short: there’s a big difference between building a system that follows the letter of the law (but not the spirit), and a system that follows the intent behind a large body of law. I agree that the legal system is a large corpus of data containing information about human values and how humans currently want their civilization organized. In order to use that corpus, we need to be able to design systems that reliably act as intended, and I’m not sure how the legal corpus helps with that technical problem (aside from providing lots of training data, which I...
Thanks, Benito. With regards to the second half of this question, I suspect that either you’ve misunderstood some of the arguments I’ve made about why our work doesn’t tend to fit into standard academic journals and conferences, or (alternatively) someone has given arguments for why our work doesn’t tend to fit into standard academic venues that I personally disagree with. My view is that our work doesn’t tend to fit into standard journals etc. because (a) we deliberately focus on research that we think academia and industry are unlikely to work on for one...
I think this has been changing in recent years, yes. A number of AI researchers (some of them quite prominent) have told me that they have largely agreed with AI safety concerns for some time, but have felt uncomfortable expressing those concerns until very recently. I do think that the tides are changing here, with the Concrete Problems in AI Safety paper (by Amodei, Olah, et al) perhaps marking the inflection point. I think that the 2015 FLI conference also helped quite a bit.
A question from Topher Halquist, on facebook:
Has MIRI considered hiring a more senior math-Ph.D., to serve in a "postdoc supervisor"-type role?
We considered it, but decided against it because supervision doesn’t seem like a key bottleneck on our research progress. Our priority is just to find people who have the right kinds of math/CS intuitions to formalize the mostly-informal problems we’re working on, and I haven’t found that this correlates with seniority. That said, I'm happy to hire senior mathematicians if we find ones who want to work...
Re: 1, "what are the main points of disagreement?" is itself currently one of the points of disagreement :) A lot of our disagreements (I think) come down to diverging inchoate mathematical intuitions, which makes it hard to precisely state why we think different problems are worth prioritizing (or to resolve the disagreements).
Also, I think that different Open Phil technical advisors have different disagreements with us. As an example, Paul Christiano and I seem to have an important disagreement about how difficult it will be to align AI systems...
As Tsvi mentioned, and as Luke has talked about before, we’re not really researching “provable AI”. (I’m not even quite sure what that term would mean.) We are trying to push towards AI systems where the way they reason is principled and understandable. We suspect that that will involve having a good understanding ourselves of how the system performs its reasoning, and when we study different types of reasoning systems we sometimes build models of systems that are trying to prove things as part of how they reason; but that’s very different from trying to m...
There’s nothing very public on this yet. Some of my writing over the coming months will bear on this topic, and some of the questions in Jessica’s agenda are more obviously applicable in “less optimistic” scenarios, but this is definitely a place where public output lags behind our private research.
As an aside, one of our main bottlenecks is technical writing capability: if you have technical writing skill and you’re interested in MIRI research, let us know.
I don’t think of our strategy as having changed much in the last year. For example, in the last AMA I said that the plan was to work on some big open problems (I named 5 here: asymptotically good reasoning under logical uncertainty, identifying the best available decision with respect to a predictive world-model and utility function, performing induction from inside an environment, identifying the referents of goals in realistic world-models, and reasoning about the behavior of smarter reasoners), and that I’d be thrilled if we could make serious progress...
Yep, we often have a number of non-MIRI folks checking the proofs, math, and citations. I’m still personally fairly involved in the writing process (because I write fast, and because I do what I can to free up the researchers’ time to do other work); this is something I’m working to reduce. Technical writing talent is one of our key bottlenecks; if you like technical writing and are interested in MIRI’s research, get in touch.
I largely endorse Jessica’s comment. I’ll add that I think the ideal MIRI researcher has their own set of big-picture views about what’s required to design aligned AI systems, and that their vision holds up well under scrutiny. (I have a number of heuristics for what makes me more or less excited about a given roadmap.)
That is, the ideal researcher isn’t just working on whatever problems catch their eye or look interesting; they’re working toward a solution of the whole alignment problem, and that vision regularly affects their research priorities.
I’ll interpret this question as “what are the most plausible ways for you to lose confidence in MIRI’s effectiveness and/or leave MIRI?” Here are a few ways that could happen for me:
I endorse Tsvi's comment above. I'll add that it’s hard to say how close we are to closing basic gaps in understanding of things like “good reasoning”, because mathematical insight is notoriously difficult to predict. All I can say is that logical induction does seem like progress to me, and we're taking various different approaches on the remaining problems. Also, yeah, one of those avenues is a follow-up to PPRHOL. (One experiment we’re running now is an attempt to implement a cellular automaton in HOL that implements a reflective reasoner with access to...
Good question. The main effect is that I’ve increased my confidence in the vague MIRI mathematical intuitions being good, and the MIRI methodology for approaching big vague problems actually working. This doesn’t constitute a very large strategic shift, for a few reasons. One reason is that my strategy was already predicated on the idea that our mathematical intuitions and methodology are up to the task. As I said in last year’s AMA, visible progress on problems like logical uncertainty (and four other problems) were one of the key indicators of success th...
Thanks for the write-up, Rob. OpenPhil actually decided to evaluate our technical agenda last summer, and Holden put Daniel Dewey on the job. The report isn't done yet, in part because it has proven very time-intensive to fully communicate the reasoning behind our research priorities, even to someone with as much understanding of the AI landscape as Daniel Dewey. Separately, we have plans to get an independent evaluation of our organizational efficacy started later in 2016, which I expect to be useful for our admin team as well as prospective donors.
FYI, w...
I want to push back a bit against point #1 ("Let's divide problems into 'funding constrained' and 'talent constrained'.) In my experience recruiting for MIRI, these constraints are tightly intertwined. To hire talent, you need money (and to get money, you often need results, which requires talent).
I think the "are they funding constrained or talent constrained?" model is incorrect, and potentially harmful. In the case of MIRI, imagine we're trying to hire a world-class researcher for $50k/year, and can't find one. Are we talent constrained, ...
All right, I'll come back for one more question. Thanks, Wei. Tough question. Briefly,
(1) I can't see that many paths to victory. The only ones I can see go through either (a) aligned de-novo AGI (which needs to be at least powerful enough to safely prevent maligned systems from undergoing intelligence explosions) or (b) very large amounts of global coordination (which would be necessary to either take our time & go cautiously, or to leap all the way to WBE without someone creating a neuromorph first). Both paths look pretty hard to walk, but in short,...
Kinda. The current approach is more like "Pretend you're trying to solve a much easier version of the problem, e.g. where you have a ton of computing power and you're trying to maximize diamond instead of hard-to-describe values. What parts of the problem would you still not know how to solve? Try to figure out how to solve those first."
(1) If we manage to (a) generate a theory of advanced agents under many simplifying assumptions, and then (b) generate a theory of bounded rational agents under far fewer simplifying assumptions, and then (c) figu...
You could call it a kind of moral relativism if you want, though it's not a term I would use. I tend to disagree with many self-proclaimed moral relativists: for example, I think it's quite possible for one to be wrong about what they value, and I am not generally willing to concede that Alice thinks murder is OK just because Alice says Alice thinks murder is OK.
Another place I depart from most moral relativists I've met is by mixing in a healthy dose of "you don't get to just make things up." Analogy: we do get to make up the rules of arithmetic...
First, I think that civilization had better be really dang mature before it considers handing over the reins to something like CEV. (Luke has written a bit about civilizational maturity in the past.)
Second, I think that the CEV paper (which is currently 11 years old) is fairly out of date, and I don't necessarily endorse the particulars of it. I do hope, though, that if humanity (or posthumanity) ever builds a singleton, that they build it with a goal of something like taking into account the extrapolated preferences of all sentients and fulfilling some su...
(1) I suspect it's possible to create an artificial system that exhibits what many people would call "intelligent behavior," and which poses an existential threat, but which is not sentient or conscious. (In the same way that Deep Blue wasn't sentient: it seems to me like optimization power may well be separable from sentience/consciousness.) That's no guarantee, of course, and if we do create a sentient artificial mind, then it will have moral weight in its own right, and that will make our job quite a bit more difficult.
(2) The goal is not to b...
Luke talks about the pros and cons of various terms here. Then, long story short, we asked Stuart Russell for some thoughts and settled on "AI alignment" (his suggestion, IIRC).
Couldn't it be that the returns on intelligence tend to not be very high for a self-improving agent around the human area?
Seems unlikely to me, given my experience as an agent at roughly the human level of intelligence. If you gave me a human-readable version of my source code, the ability to use money to speed up my cognition, and the ability to spawn many copies of myself (both to parallelize effort and to perform experiments with) then I think I'd be "superintelligent" pretty quickly. (In order for the self-improvement landscape to be shall...
Great question! I suggest checking out either our research guide or our technical agenda. The first is geared towards students who are wondering what to study in order to eventually gain the skills to be an AI alignment researcher, the latter is geared more towards professionals who already have the skills and are wondering what the current open problems are.
In your case, I'd guess maybe (1) get some solid foundations via either set theory or type theory, (2) get solid foundations on AI, perhaps via AI: A Modern Approach, (3) brush up on probability theory...
1) The things we have no idea how to do aren't the implicit assumptions in the technical agenda, they're the explicit subject headings: decision theory, logical uncertainty, Vingean reflection, corrigibility, etc :-)
We've tried to make it very clear in various papers that we're dealing with very limited toy models that capture only a small part of the problem (see, e.g., basically all of section 6 in the corrigibility paper).
Right now, we basically have a bunch of big gaps in our knowledge, and we're trying to make mathematical models that capture at least...
(1) Eventually. Predicting the future is hard. My 90% confidence interval conditioned on no global catastrophes is maybe 5 to 80 years. That is to say, I don't know.
(2) I fairly strongly expect a fast takeoff. (Interesting aside: I was recently at a dinner full of AI scientists, some of them very skeptical about the whole long-term safety problem, who unanimously professed that they expect a fast takeoff -- I'm not sure yet how to square this with the fact that Bostrom's survey showed fast takeoff was a minority position).
It seems hard (but not impossible)...
We don't have a working definition of "what has intrinsic value." My basic view on these hairy problems ("but what should I value?") is that we really don't want to be coding in the answer by hand. I'm more optimistic about building something that has a few layers of indirection, e.g., something that figures out how to act as intended, rather than trying to transmit your object-level intentions by hand.
In the paper you linked, I think Max is raising about a slightly different issue. He's talking about what we would call the ontology ide...
Great question! The short version is, writing more & publishing more (and generally engaging with the academic mainstream more) are very high on my priority list.
Mainstream publications have historically been fairly difficult for us, as until last year, AI alignment research was seen as fairly kooky. (We've had a number of papers rejected from various journals due to the "weird AI motivation.") Going forward, it looks like that will be less of an issue.
That said, writing capability is a huge bottleneck right now. Our researchers are currently...
Hard to get there. Highly likely that we get to neuromorphic AI along the way. (Low-fidelity images or low-speed partial simulations are likely very useful for learning more about intelligence, and I currently expect that the caches of knowledge unlocked on the way to WBE probably get you to AI before the imaging/hardware supports WBE.)
Short version: FAI. (You said "hope", not "expect" :-p)
Longer version: Hard question, both because (a) I don't know how you want me to trade off between how nice the advance would be and how likely we are to get it, and (b) my expectations for the next five years are very volatile. In the year since Nick Bostrom released Superintelligence, there has been a huge wave of interest in the future of AI (due in no small part to the efforts of FLI and their wonderful Puerto Rico conference!), and my expectations of where I'll be in five years ...
We're actually going to be hiring a full-time office manager soon: someone who can just Make Stuff Happen and free up a lot of our day-to-day workload. Keep your eyes peeled, we'll be advertising the opening soon.
Additionally, we're hurting for researchers who can write fast & well, and before too long we'll be looking for a person who can stay up to speed on the technical research but spend most of their time doing outreach and stewarding other researchers who are interested in doing AI alignment research. Both of these jobs would require a bit less technical ability than is required to make new breakthroughs in the field.
That post mixes a bunch of different assertions together, let me try to distill a few of them out and answer them in turn:
One of Peter's first (implicit) points is that AI alignment is a speculative cause. I tend to disagree.
Imagine it's 1942. The Manhattan project is well under way, Leo Szilard has shown that it's possible to get a neutron chain reaction, and physicists are hard at work figuring out how to make an atom bomb. You suggest that this might be a fine time to start working on nuclear containment, so that, once humans are done bombing the ever...
(1) That's not quite how I'd characterize the current technical agenda. Rather, I'd say that in order to build an AI aligned with human interests, you need to do three things: (a) understand how to build an AI that's aligned with anything (could you build an AI that reliably builds as much diamond as possible?), (b) understand how to build an AI that assists you in correcting things-you-perceive-as-flaws (this doesn't come for free, but it's pretty important, because humans are bad at getting software right on the first try), and (c) figure out how to buil...
There's a big spectrum, there. Some people think that no matter what the AI does that's fine because it's our progeny (even if it turns as much matter as it can into a giant computer so it can find better YouTube recommendations). Other people think that you can't actually build a superintelligent paperclip maximizer (because maximizing paperclips would be stupid, and we're assuming that it's intelligent). Other people think that yeah, you don't get good behavior by default, but AI is hundreds and hundreds of years off, so we don't need to start worrying n...
(1) I don't want to put words in their mouths. I'm guessing that most of us have fairly broad priors over what may happen, though. The future's hard to predict.
(2) Depends what you mean by "Friendly AI research." Does AI boxing count? Does improving the transparency of ML algorithms count? Once the FLI grants start going through, there will be lots of people doing long-term AI safety research that may well be useful, so if you count that as FAI research, then the answer is "there will be soon." But if by "FAI research" you mean "working towards a theoretical understanding of highly reliable advanced agents," then the answer is "not to my knowledge, no."
(1) Things Executive!Nate will do differently from Researcher!Nate? Or things Nate!MIRI will do differently from Luke!MIRI? For the former, I'll be thinking lots more about global coordination & engaging with interested academics etc, and lots less about specific math problems. For the latter, the biggest shift is probably going to be something like "more engagement with the academic mainstream," although it's a bit hard to say: Luke probably would have pushed in that direction too, after growing the research team a bit. (I have a lot of oppo...
1) Huh, that hasn't been my experience. We have a number of potential donors who ring us up and ask who in AI alignment needs money the most at the moment. (In fact, last year, we directed a number of donors to FHI, who had much more of a funding gap than MIRI did at that time.)
2) If MIRI disappeared and everything else was held constant, then I'd be pretty concerned about the lack of people focused on the object level problems. (All talk more about why I think this is so important in a little bit, I'm pretty sure at least one other person asks that questi...
(I agree; thanks for the nuance)