EDIT: I'm only going to answer a few more questions, due to time constraints. I might eventually come back and answer more. I still appreciate getting replies with people's thoughts on things I've written.
I'm going to do an AMA on Tuesday next week (November 19th). Below I've written a brief description of what I'm doing at the moment. Ask any questions you like; I'll respond to as many as I can on Tuesday.
Although I'm eager to discuss MIRI-related things in this AMA, my replies will represent my own views rather than MIRI's, and as a rule I won't be running my answers by anyone else at MIRI. Think of it as a relatively candid and informal Q&A session, rather than anything polished or definitive.
----
I'm a researcher at MIRI. At MIRI I divide my time roughly equally between technical work and recruitment/outreach work.
On the recruitment/outreach side, I do things like the following:
- For the AI Risk for Computer Scientists workshops (which are slightly badly named; we accept some technical people who aren't computer scientists), I handle the intake of participants, and also teach classes and lead discussions on AI risk at the workshops.
- I do most of the technical interviewing for engineering roles at MIRI.
- I manage the AI Safety Retraining Program, in which MIRI gives grants to people to study ML for three months with the goal of making it easier for them to transition into working on AI safety.
- I sometimes do weird things like going on a Slate Star Codex roadtrip, where I led a group of EAs as we travelled along the East Coast going to Slate Star Codex meetups and visiting EA groups for five days.
On the technical side, I mostly work on some of our nondisclosed-by-default technical research; this involves thinking about various kinds of math and implementing things related to the math. Because the work isn't public, there are many questions about it that I can't answer. But this is my problem, not yours; feel free to ask whatever questions you like and I'll take responsibility for choosing to answer or not.
----
Here are some things I've been thinking about recently:
- I think that the field of AI safety is growing in an awkward way. Lots of people are trying to work on it, and many of these people have pretty different pictures of what the problem is and how we should try to work on it. How should we handle this? How should you try to work in a field when at least half the "experts" are going to think that your research direction is misguided?
- The AIRCS workshops that I'm involved with contain a variety of material which attempts to help participants think about the world more effectively. I have thoughts about what's useful and not useful about rationality training.
- I have various crazy ideas about EA outreach. I think the SSC roadtrip was good; I think some EAs who work at EA orgs should consider doing "residencies" in cities without much fulltime EA presence, where they mostly do their normal job but also talk to people.
Hm, I think I may have misinterpretted your previous comment as emphasizing the point that P_CDT "gets you less utility" rather than the point that P_SONOFCDT "gets you less utility." So my comment was aiming to explain why I don't think the fact that P_CDT gets less utility provides a strong challenge to the claim that R_CDT is true (unless we accept the "No Self-Effacement Principle"). But it sounds like you might agree that this fact doesn't on its own provide a strong challenge.
In response to the first argument alluded to here: "Gets the most [expected] utility" is ambiguous, as I think we've both agreed.
My understanding is that P_SONOFCDT is definitionally the policy that, if an agent decided to adopt it, would cause the largest increase in expected utility. So -- if we evaluate the expected utility of a decision to adopt a policy from a casual perspective -- it seems to me that P_SONOFCDT "gets the most expected utility."
If we evaluate the expected utility of a policy from an evidential or subjunctive perspective, however, then another policy may "get the most utility" (because policy adoption decisions may be non-causally correlated.)
Apologies if I'm off-base, but it reads to me like you might be suggesting an argument along these lines:
R_CDT says that it is rational to decide to follow a policy that would not maximize "expected utility" (defined in evidential/subjunctive terms).
(Assumption) But it is not rational to decide to follow a policy that would not maximize "expected utility" (defined in evidential/subjunctive terms).
Therefore R_CDT is not true.
The natural response to this argument is that it's not clear why we should accept the assumption in Step 2. R_CDT says that the rationality of a decision depends on its "expected utility" defined in causal terms. So someone starting from the position that R_CDT is true obviously won't accept the assumption in Step 2. R_EDT and R_FDT say that the rationality of a decision depends on its "expected utility" defined in evidential or subjunctive terms. So we might allude to R_EDT or R_FDT to justify the assumption, but of course this would also mean arguing backwards from the conclusion that the argument is meant to reach.
Overall at least this particular simple argument -- that R_CDT is false because P_SONOFCDT gets less "expected utility" as defined in evidential/quasi-evidential terms -- would seemingly fail to due circularity. But you may have in mind a different argument.
I felt confused by this comment. Doesn't even R_FDT judge the rationality of a decision by its expected value (rather than its actual value)? And presumably you don't want to say that someone who accepts unpromising gambles and gets lucky (ending up with high actual average utility) has made more "rational" decisions than someone who accepts promising gambles and gets unlucky (ending up with low actual average utility)?
You also correctly point out that the decision procedure that R_CDT implies agents should rationally commit to -- P_SONOFCDT -- sometimes outputs decisions that definitely make things worse. So "Don't Make Things Worse" implies that some of the decisions outputted by P_SONOFCDT are irrational.
But I still don't see what the argument is here unless we're assuming "No Self-Effacement." It still seems to me like we have a few initial steps and then a missing piece.
(Observation) R_CDT implies that it is rational to commit to following the decision procedure P_SONOFCDT.
(Observation) P_SONOFCDT sometimes outputs decisions that definitely make things worse.
(Assumption) It is irrational to take decisions that definitely make things worse. In other words, the "Don't Make Things Worse" Principle is true.
Therefore, as an implication of Step 2 and Step 3, P_SONOFCDT sometimes outputs irrational decisions.
???
Therefore, R_CDT is false.
The "No Self-Effacement" Principle is equivalent to the principle that: If a criterion of rightness implies that it is rational to commit to a decision procedure, then that decision procedure only produces rational actions. So if we were to assume "No Self-Effacement" in Step 5 then this would allow us to arrive at the conclusion that R_CDT is false. But if we're not assuming "No Self-Effacement," then it's not clear to me how we get there.
Actually, in the context of this particular argument, I suppose we don't really have the option of assuming that "No Self-Effacement" is true -- because this assumption would be inconsistent with the earlier assumption that "Don't Make Things Worse" is true. So I'm not sure it's actually possible to make this argument schema work in any case.
There may be a pretty different argument here, which you have in mind. I at least don't see it yet though.