All of So8res's Comments + Replies

(I agree; thanks for the nuance)

So8res
100
22
0

Do you think EA's self-reflection about this is at all productive, considering most people had even less information than you?

I don't have terribly organized thoughts about this. (And I am still not paying all that much attention—I have much more patience for picking apart my own reasoning processes looking for ways to improve them, than I have for reading other people's raw takes :-p)

But here's some unorganized and half-baked notes:


I appreciated various expressions of emotion. Especially when they came labeled as such.

I think there was also a bunch of oth... (read more)

FWIW, I would totally want to openly  do a postmortem. once the bankruptcy case is over, i'll be pretty happy to publicly say what i knew at various points of time. but i'm currently holding back for legal reasons, and instead discuss it (as you said) "behind closed doors". (Which is frustrating for everyone who would like to have transparent public discussion, sorry about that. it' is also really frustrating for me!)

I think the truth is closest to "we had a bunch of hints that we failed to assemble"


 

Fwiw, for common knowledge (though I don't know everything happening at CEA), so that other people can pick up the slack and not assume things are covered, or so that people can push me to change my prioritization, here's what I see happening at CEA in regard to:

"finding some other way to successfully heed the 10, which requires distinguishing them from the background noise--and distinguishing them as something actionable--before it's too late, and then routing the requisite action to the people who can do something about it"

  • I've been thinking some about i
... (read more)

Good point! Currently, I think the "pry more" lesson is supposed to account for a bunch of this.

Since making this update, I have in fact pried more into friends' lives. In at least one instance I found some stuff that worried me, at which point I was naturally like "hey, this worries me; it pattern-matches to some bad situations I've seen; I feel wary and protective; I request an opportunity to share and/or put you in touch with people who've been through putatively-analogous situations (though I can also stfu if you're sick of hearing people's triggered t... (read more)

Can you give an example or two of failure modes or "categories of failure modes that are easy to foresee" that you think are addressed by some HRAD topic? I'd thought previously that thinking in terms of failure modes wasn't a good way to understand HRAD research.

I want to steer clear of language that might make it sound like we’re saying:

  • X 'We can't make broad-strokes predictions about likely ways that AGI could go wrong.'

  • X 'To the extent we can make such predictions, they aren't important for informing research directions.'

  • X 'The best

... (read more)

Thanks for this solid summary of your views, Daniel. For others’ benefit: MIRI and Open Philanthropy Project staff are in ongoing discussion about various points in this document, among other topics. Hopefully some portion of those conversations will be made public at a later date. In the meantime, a few quick public responses to some of the points above:

2) If we fundamentally "don't know what we're doing" because we don't have a satisfying description of how an AI system should reason and make decisions, then we will probably make lots of mist

... (read more)
3
Daniel_Dewey
Just planting a flag to say that I'm thinking more about this so that I can respond well.
5
Daniel_Dewey
Thanks Nate! This is particularly helpful to know. Can you give an example or two of failure modes or "categories of failure modes that are easy to foresee" that you think are addressed by some HRAD topic? I'd thought previously that thinking in terms of failure modes wasn't a good way to understand HRAD research. I'm confused by this as a follow-up to the previous paragraph. This doesn't look like an example of "focusing on categories of failure modes that are easy to foresee," it looks like a case where you're explicitly not using concrete failure modes to decide what to work on. I feel like this fits with the "not about concrete failure modes" narrative that I believed before reading your comment, FWIW.

Under whatever constraints Open Phil provided, I'd have sent the 'best by academic lights' papers I had.

We originally sent Nick Beckstead what we considered our four most important 2015 results, at his request; these were (1) the incompatibility of the "Inductive Coherence" framework and the "Asymptotic Convergence in Online Learning with Unbounded Delays" framework; (2) the demonstration in "Proof-Producing Reflection for HOL" that a non-pathological form of self-referential reasoning is possible in a certain class of theo... (read more)

Thanks for the response, Gregory. I was hoping to see more questions along these lines in the AMA, so I'm glad you followed up.

Open Phil's grant write-up is definitely quite critical, and not an endorsement. One of Open Phil's main criticisms of MIRI is that they don't think our agent foundations agenda is likely to be useful for AI alignment; but their reasoning behind this is complicated, and neither Open Phil nor MIRI has had time yet to write up our thoughts in any detail. I suggest pinging me to say more about this once MIRI and Open Phil have put up ... (read more)

2
Gregory Lewis🔸
Nate, my thanks for your reply. I regret I may not have expressed myself well enough for your reply to precisely target the worries I expressed; I also regret insofar as you reply overcomes my poor expression, it make my worries grow deeper. If I read your approach to the Open Phil review correctly, you submitted some of the more technically unimpressive papers for review because they demonstrated the lead author developing some interesting ideas for research direction, and that they in some sense lead up to the 'big result' (Logical Induction). If so, this looks like a pretty surprising error: one of the standard worries facing MIRI given its fairly slender publication record is the technical quality of the work, and it seemed pretty clear that was the objective behind sending them out for evaluation. Under whatever constraints Open Phil provided, I'd have sent the 'best by academic lights' papers I had. In candour, I think 'MIRI barking up the wrong tree' and/or (worse) 'MIRI not doing that much good research)' is a much better explanation for what is going on than 'inferential distance'. I struggle to imagine a fairer (or more propitious-to-MIRI) hearing than the Open Phil review: it involved two people (Dewey and Christiano) who previously worked with you guys, Dewey spent over 100 hours trying to understand the value of your work, they comissioned external experts in the field to review your work. Suggesting that the fairly adverse review that results may be a product of lack of understanding makes MIRI seem more like a mystical tradition than a research group. If MIRI is unable to convince someone like Dewey, the prospects of it making the necessary collaborations or partnerships with the wider AI community look grim. I had Aaronson down as within MIRI's sphere of influence, but if I overstate I apologize (I am correct in that Yuan previously worked for you, right?) I look forward to seeing MIRI producing or germinating some concrete results in decision t
3
Sean_o_h
"Also, our approach to decision theory stirred up a lot of interest from professional decision theorists at last year's Cambridge conference; expect more about this in the next few months." A quick note to say that comments that have made their way back to me from relevant circles agree with this. Also, my own impression - from within academia, but outside decision theory and AI - is that the level of recognition of, and respect for, MIRI's work is steadily rising in academia, although inferential gaps like what nate describes certainly exist, plus more generic cultural gaps. I've heard positive comments about MIRI's work from academics I wouldn't have expected even to have heard of MIRI. And my impression, from popping by things like Cambridge's MIRIx discussion group, is that they're populated for the most part by capable people with standard academic backgrounds who have become involved based on the merits of the work rather than any existing connection to MIRI (although I imagine some are or were lesswrong readers).

Posts or comments on personal Twitter accounts, Facebook walls, etc. should not be assumed to represent any official or consensus MIRI position, unless noted otherwise. I'll echo Rob's comment here that "a good safety approach should be robust to the fact that the designers don’t have all the answers". If an AI project hinges on the research team being completely free from epistemic shortcomings and moral failings, then the project is doomed (and should change how it's doing alignment research).

I suspect we're on the same page about it being impo... (read more)

In short: there’s a big difference between building a system that follows the letter of the law (but not the spirit), and a system that follows the intent behind a large body of law. I agree that the legal system is a large corpus of data containing information about human values and how humans currently want their civilization organized. In order to use that corpus, we need to be able to design systems that reliably act as intended, and I’m not sure how the legal corpus helps with that technical problem (aside from providing lots of training data, which I... (read more)

Thanks, Benito. With regards to the second half of this question, I suspect that either you’ve misunderstood some of the arguments I’ve made about why our work doesn’t tend to fit into standard academic journals and conferences, or (alternatively) someone has given arguments for why our work doesn’t tend to fit into standard academic venues that I personally disagree with. My view is that our work doesn’t tend to fit into standard journals etc. because (a) we deliberately focus on research that we think academia and industry are unlikely to work on for one... (read more)

I'm not sure I understand the hypothetical -- most of the actions that I deem necessary are aimed at affecting the trajectory of the AI field in one way or another.

0
Dr_Manhattan
Ok, that's informative. So the dominant factor is not the ability to get to the finish line faster (which kind of makes sense)

I think this has been changing in recent years, yes. A number of AI researchers (some of them quite prominent) have told me that they have largely agreed with AI safety concerns for some time, but have felt uncomfortable expressing those concerns until very recently. I do think that the tides are changing here, with the Concrete Problems in AI Safety paper (by Amodei, Olah, et al) perhaps marking the inflection point. I think that the 2015 FLI conference also helped quite a bit.

I'm not exactly sure what venue it will show up in, but it will very likely be mentioned on the MIRI blog (or perhaps just posted there directly). intelligence.org/blog.

A question from Topher Halquist, on facebook:

Has MIRI considered hiring a more senior math-Ph.D., to serve in a "postdoc supervisor"-type role?

We considered it, but decided against it because supervision doesn’t seem like a key bottleneck on our research progress. Our priority is just to find people who have the right kinds of math/CS intuitions to formalize the mostly-informal problems we’re working on, and I haven’t found that this correlates with seniority. That said, I'm happy to hire senior mathematicians if we find ones who want to work... (read more)

Re: 1, "what are the main points of disagreement?" is itself currently one of the points of disagreement :) A lot of our disagreements (I think) come down to diverging inchoate mathematical intuitions, which makes it hard to precisely state why we think different problems are worth prioritizing (or to resolve the disagreements).

Also, I think that different Open Phil technical advisors have different disagreements with us. As an example, Paul Christiano and I seem to have an important disagreement about how difficult it will be to align AI systems... (read more)

As Tsvi mentioned, and as Luke has talked about before, we’re not really researching “provable AI”. (I’m not even quite sure what that term would mean.) We are trying to push towards AI systems where the way they reason is principled and understandable. We suspect that that will involve having a good understanding ourselves of how the system performs its reasoning, and when we study different types of reasoning systems we sometimes build models of systems that are trying to prove things as part of how they reason; but that’s very different from trying to m... (read more)

0
turchin
Thanks! Could link there you will write about this subject later?

There’s nothing very public on this yet. Some of my writing over the coming months will bear on this topic, and some of the questions in Jessica’s agenda are more obviously applicable in “less optimistic” scenarios, but this is definitely a place where public output lags behind our private research.

As an aside, one of our main bottlenecks is technical writing capability: if you have technical writing skill and you’re interested in MIRI research, let us know.

I don’t think of our strategy as having changed much in the last year. For example, in the last AMA I said that the plan was to work on some big open problems (I named 5 here: asymptotically good reasoning under logical uncertainty, identifying the best available decision with respect to a predictive world-model and utility function, performing induction from inside an environment, identifying the referents of goals in realistic world-models, and reasoning about the behavior of smarter reasoners), and that I’d be thrilled if we could make serious progress... (read more)

0
Dr_Manhattan
Just out of curiosity how would your estimate update if you can enough resources to do anything you deemed necessary but not enough to affect current trajectory of the field

Yep, we often have a number of non-MIRI folks checking the proofs, math, and citations. I’m still personally fairly involved in the writing process (because I write fast, and because I do what I can to free up the researchers’ time to do other work); this is something I’m working to reduce. Technical writing talent is one of our key bottlenecks; if you like technical writing and are interested in MIRI’s research, get in touch.

I largely endorse Jessica’s comment. I’ll add that I think the ideal MIRI researcher has their own set of big-picture views about what’s required to design aligned AI systems, and that their vision holds up well under scrutiny. (I have a number of heuristics for what makes me more or less excited about a given roadmap.)

That is, the ideal researcher isn’t just working on whatever problems catch their eye or look interesting; they’re working toward a solution of the whole alignment problem, and that vision regularly affects their research priorities.

I’ll interpret this question as “what are the most plausible ways for you to lose confidence in MIRI’s effectiveness and/or leave MIRI?” Here are a few ways that could happen for me:

  1. I could be convinced that I was wrong about the type and quality of AI alignment research that the external community is able to do. There’s some inferential distance here, so I'm not expecting to explain my model in full, but in brief, I currently expect that there are a few types of important research that academia and industry won’t do by default. If I was convinced that e
... (read more)

I endorse Tsvi's comment above. I'll add that it’s hard to say how close we are to closing basic gaps in understanding of things like “good reasoning”, because mathematical insight is notoriously difficult to predict. All I can say is that logical induction does seem like progress to me, and we're taking various different approaches on the remaining problems. Also, yeah, one of those avenues is a follow-up to PPRHOL. (One experiment we’re running now is an attempt to implement a cellular automaton in HOL that implements a reflective reasoner with access to... (read more)

Good question. The main effect is that I’ve increased my confidence in the vague MIRI mathematical intuitions being good, and the MIRI methodology for approaching big vague problems actually working. This doesn’t constitute a very large strategic shift, for a few reasons. One reason is that my strategy was already predicated on the idea that our mathematical intuitions and methodology are up to the task. As I said in last year’s AMA, visible progress on problems like logical uncertainty (and four other problems) were one of the key indicators of success th... (read more)

Thanks for the write-up, Rob. OpenPhil actually decided to evaluate our technical agenda last summer, and Holden put Daniel Dewey on the job. The report isn't done yet, in part because it has proven very time-intensive to fully communicate the reasoning behind our research priorities, even to someone with as much understanding of the AI landscape as Daniel Dewey. Separately, we have plans to get an independent evaluation of our organizational efficacy started later in 2016, which I expect to be useful for our admin team as well as prospective donors.

FYI, w... (read more)

I want to push back a bit against point #1 ("Let's divide problems into 'funding constrained' and 'talent constrained'.) In my experience recruiting for MIRI, these constraints are tightly intertwined. To hire talent, you need money (and to get money, you often need results, which requires talent).

I think the "are they funding constrained or talent constrained?" model is incorrect, and potentially harmful. In the case of MIRI, imagine we're trying to hire a world-class researcher for $50k/year, and can't find one. Are we talent constrained, ... (read more)

1
Benjamin_Todd
I agree many things are both talent and constrained and funding constrained. I think you can have the whole spectrum from mainly constrained by a certain type of talent, to constrained by both, to mainly constrained by funding.

All right, I'll come back for one more question. Thanks, Wei. Tough question. Briefly,

(1) I can't see that many paths to victory. The only ones I can see go through either (a) aligned de-novo AGI (which needs to be at least powerful enough to safely prevent maligned systems from undergoing intelligence explosions) or (b) very large amounts of global coordination (which would be necessary to either take our time & go cautiously, or to leap all the way to WBE without someone creating a neuromorph first). Both paths look pretty hard to walk, but in short,... (read more)

Kinda. The current approach is more like "Pretend you're trying to solve a much easier version of the problem, e.g. where you have a ton of computing power and you're trying to maximize diamond instead of hard-to-describe values. What parts of the problem would you still not know how to solve? Try to figure out how to solve those first."

(1) If we manage to (a) generate a theory of advanced agents under many simplifying assumptions, and then (b) generate a theory of bounded rational agents under far fewer simplifying assumptions, and then (c) figu... (read more)

You could call it a kind of moral relativism if you want, though it's not a term I would use. I tend to disagree with many self-proclaimed moral relativists: for example, I think it's quite possible for one to be wrong about what they value, and I am not generally willing to concede that Alice thinks murder is OK just because Alice says Alice thinks murder is OK.

Another place I depart from most moral relativists I've met is by mixing in a healthy dose of "you don't get to just make things up." Analogy: we do get to make up the rules of arithmetic... (read more)

3
Alex_Altair
igotthatreference.jpg

First, I think that civilization had better be really dang mature before it considers handing over the reins to something like CEV. (Luke has written a bit about civilizational maturity in the past.)

Second, I think that the CEV paper (which is currently 11 years old) is fairly out of date, and I don't necessarily endorse the particulars of it. I do hope, though, that if humanity (or posthumanity) ever builds a singleton, that they build it with a goal of something like taking into account the extrapolated preferences of all sentients and fulfilling some su... (read more)

(1) I suspect it's possible to create an artificial system that exhibits what many people would call "intelligent behavior," and which poses an existential threat, but which is not sentient or conscious. (In the same way that Deep Blue wasn't sentient: it seems to me like optimization power may well be separable from sentience/consciousness.) That's no guarantee, of course, and if we do create a sentient artificial mind, then it will have moral weight in its own right, and that will make our job quite a bit more difficult.

(2) The goal is not to b... (read more)

The most reliable strategy to date is "ask me" :-)

Luke talks about the pros and cons of various terms here. Then, long story short, we asked Stuart Russell for some thoughts and settled on "AI alignment" (his suggestion, IIRC).

Couldn't it be that the returns on intelligence tend to not be very high for a self-improving agent around the human area?

Seems unlikely to me, given my experience as an agent at roughly the human level of intelligence. If you gave me a human-readable version of my source code, the ability to use money to speed up my cognition, and the ability to spawn many copies of myself (both to parallelize effort and to perform experiments with) then I think I'd be "superintelligent" pretty quickly. (In order for the self-improvement landscape to be shall... (read more)

Great question! I suggest checking out either our research guide or our technical agenda. The first is geared towards students who are wondering what to study in order to eventually gain the skills to be an AI alignment researcher, the latter is geared more towards professionals who already have the skills and are wondering what the current open problems are.

In your case, I'd guess maybe (1) get some solid foundations via either set theory or type theory, (2) get solid foundations on AI, perhaps via AI: A Modern Approach, (3) brush up on probability theory... (read more)

1) The things we have no idea how to do aren't the implicit assumptions in the technical agenda, they're the explicit subject headings: decision theory, logical uncertainty, Vingean reflection, corrigibility, etc :-)

We've tried to make it very clear in various papers that we're dealing with very limited toy models that capture only a small part of the problem (see, e.g., basically all of section 6 in the corrigibility paper).

Right now, we basically have a bunch of big gaps in our knowledge, and we're trying to make mathematical models that capture at least... (read more)

Than a slow takeoff? Yes :-)

(1) Eventually. Predicting the future is hard. My 90% confidence interval conditioned on no global catastrophes is maybe 5 to 80 years. That is to say, I don't know.

(2) I fairly strongly expect a fast takeoff. (Interesting aside: I was recently at a dinner full of AI scientists, some of them very skeptical about the whole long-term safety problem, who unanimously professed that they expect a fast takeoff -- I'm not sure yet how to square this with the fact that Bostrom's survey showed fast takeoff was a minority position).

It seems hard (but not impossible)... (read more)

3
AlexMennen
Perhaps the first of them to voice a position on the matter expected a fast takeoff and was held in high regard by the others, so they followed along, having not previously thought about it?
0
RyanCarey
Couldn't it be that the returns on intelligence tend to not be very high for a self-improving agent around the human area? Like, it could be that modifying yourself when you're human-level intelligent isn't very useful, but that things really take off at 20x the human level. That would seem to suggest a possible d) the first superhuman AI system is self-improves for some time and then peters out. More broadly, the suggestion is that since the machine is presumably not yet superintelligent, there might be relevant constraints other than incentives and hardware. Plausible or not?

We don't have a working definition of "what has intrinsic value." My basic view on these hairy problems ("but what should I value?") is that we really don't want to be coding in the answer by hand. I'm more optimistic about building something that has a few layers of indirection, e.g., something that figures out how to act as intended, rather than trying to transmit your object-level intentions by hand.

In the paper you linked, I think Max is raising about a slightly different issue. He's talking about what we would call the ontology ide... (read more)

1
Alex_Altair
That diamond/carbon scenario is an excellent concrete example of the ontology problem.

I mostly agree with Daniel's paper :-)

0
AlexLundborg
That was my guess :) To be more specific: do you (or does MIRI) have any preferences for which strategy to pursue, or is it too early to say? I get the sense from MIRI and FHI that aligned sovereign AI is the end goal. Thanks again for doing the AMA!

Great question! The short version is, writing more & publishing more (and generally engaging with the academic mainstream more) are very high on my priority list.

Mainstream publications have historically been fairly difficult for us, as until last year, AI alignment research was seen as fairly kooky. (We've had a number of papers rejected from various journals due to the "weird AI motivation.") Going forward, it looks like that will be less of an issue.

That said, writing capability is a huge bottleneck right now. Our researchers are currently... (read more)

Hard to get there. Highly likely that we get to neuromorphic AI along the way. (Low-fidelity images or low-speed partial simulations are likely very useful for learning more about intelligence, and I currently expect that the caches of knowledge unlocked on the way to WBE probably get you to AI before the imaging/hardware supports WBE.)

Short version: FAI. (You said "hope", not "expect" :-p)

Longer version: Hard question, both because (a) I don't know how you want me to trade off between how nice the advance would be and how likely we are to get it, and (b) my expectations for the next five years are very volatile. In the year since Nick Bostrom released Superintelligence, there has been a huge wave of interest in the future of AI (due in no small part to the efforts of FLI and their wonderful Puerto Rico conference!), and my expectations of where I'll be in five years ... (read more)

We're actually going to be hiring a full-time office manager soon: someone who can just Make Stuff Happen and free up a lot of our day-to-day workload. Keep your eyes peeled, we'll be advertising the opening soon.

Additionally, we're hurting for researchers who can write fast & well, and before too long we'll be looking for a person who can stay up to speed on the technical research but spend most of their time doing outreach and stewarding other researchers who are interested in doing AI alignment research. Both of these jobs would require a bit less technical ability than is required to make new breakthroughs in the field.

That post mixes a bunch of different assertions together, let me try to distill a few of them out and answer them in turn:


One of Peter's first (implicit) points is that AI alignment is a speculative cause. I tend to disagree.

Imagine it's 1942. The Manhattan project is well under way, Leo Szilard has shown that it's possible to get a neutron chain reaction, and physicists are hard at work figuring out how to make an atom bomb. You suggest that this might be a fine time to start working on nuclear containment, so that, once humans are done bombing the ever... (read more)

(1) That's not quite how I'd characterize the current technical agenda. Rather, I'd say that in order to build an AI aligned with human interests, you need to do three things: (a) understand how to build an AI that's aligned with anything (could you build an AI that reliably builds as much diamond as possible?), (b) understand how to build an AI that assists you in correcting things-you-perceive-as-flaws (this doesn't come for free, but it's pretty important, because humans are bad at getting software right on the first try), and (c) figure out how to buil... (read more)

2
Ben Pace
Thank you for the response; it was helpful :^)

There's a big spectrum, there. Some people think that no matter what the AI does that's fine because it's our progeny (even if it turns as much matter as it can into a giant computer so it can find better YouTube recommendations). Other people think that you can't actually build a superintelligent paperclip maximizer (because maximizing paperclips would be stupid, and we're assuming that it's intelligent). Other people think that yeah, you don't get good behavior by default, but AI is hundreds and hundreds of years off, so we don't need to start worrying n... (read more)

  1. (a) grow the research team, (b) engage more with mainstream academia. I'd also like to spend some time experimenting to figure out how to structure the research team so as to make it more effective (we have a lot of flexibility here that mainstream academic institutes don't have). Once we have the first team growing steadily and running smoothly, it's not entirely clear whether the next step will be (c.1) grow it faster or (c.2) spin up a second team inside MIRI taking a different approach to AI alignment. I'll punt that question to future-Nate.
  2. So first
... (read more)

(1) I don't want to put words in their mouths. I'm guessing that most of us have fairly broad priors over what may happen, though. The future's hard to predict.

(2) Depends what you mean by "Friendly AI research." Does AI boxing count? Does improving the transparency of ML algorithms count? Once the FLI grants start going through, there will be lots of people doing long-term AI safety research that may well be useful, so if you count that as FAI research, then the answer is "there will be soon." But if by "FAI research" you mean "working towards a theoretical understanding of highly reliable advanced agents," then the answer is "not to my knowledge, no."

(1) Things Executive!Nate will do differently from Researcher!Nate? Or things Nate!MIRI will do differently from Luke!MIRI? For the former, I'll be thinking lots more about global coordination & engaging with interested academics etc, and lots less about specific math problems. For the latter, the biggest shift is probably going to be something like "more engagement with the academic mainstream," although it's a bit hard to say: Luke probably would have pushed in that direction too, after growing the research team a bit. (I have a lot of oppo... (read more)

1) Huh, that hasn't been my experience. We have a number of potential donors who ring us up and ask who in AI alignment needs money the most at the moment. (In fact, last year, we directed a number of donors to FHI, who had much more of a funding gap than MIRI did at that time.)

2) If MIRI disappeared and everything else was held constant, then I'd be pretty concerned about the lack of people focused on the object level problems. (All talk more about why I think this is so important in a little bit, I'm pretty sure at least one other person asks that questi... (read more)

Load more