259 karmaJoined


I want to be in a movement or community where people hold their heads up, say what they think is true, speak and listen freely, and bother to act on principles worth defending / to attend to aspects of reputation they actually care about, but not to worry about PR as such.

It's somehow hard for me to read the OP and the comments below it without feeling like I should cower in fear and try to avoid social attack.  I hope we don't anyhow.  (TBF, lots of the comments actively make this better, and I appreciate that!)

(Alternately put: a culture of truthseeking seems really important if we want to do actual good, and not just think we're doing good or gain careers by being associated with the idea of do-gooding or something.  I find it actively difficult to remember I wish to live by truth-seeking principles/culture while reading these threads somehow.  I want a counterweight to make it easier.)

From accounts I heard later (I was not at the camp, but did hear a lot about it from folks who were), I'm basically certain CFAR would have interfered with the minor going even if the minor had agreed.  Multiple CFAR staff members stepped in to attempt to prevent the minor from going (as mentioned in e.g. https://www.rationality.org/resources/updates/2019/cfars-mistakes-regarding-brent, and as I also remember from closer to time) much fuss was correctly made at the time, etc.  I agree that many bad mistakes were made, then and previously and afterwards, however.

Also, after we eventually understood what the deal had been with Brent, we gave up running programs for minors.  We continue to run programs for adults.  My feeling is that adults should indeed not expect that we are vetting a particularly careful or safe environment particularly reliably, but that this is often not the crux for whether an adult wishes to attend a CFAR workshop.

I disagree. It seems to me that the EA community's strength, goodness, and power lie almost entirely in our ability to reason well (so as to be actually be "effective", rather than merely tribal/random). It lies in our ability to trust in the integrity of one anothers' speech and reasoning, and to talk together to figure out what's true.

Finding the real leverage points in the world is probably worth orders of magnitude in our impact. Our ability to think honestly and speak accurately and openly with each other seems to me to be a key part of how we access those "orders of magnitude of impact."

In contrast, our ability to have more money/followers/etc. (via not ending up on the wrong side of a cultural revolution, etc.) seems to me to be worth... something, in expectation, but not as much as our ability to think and speak together is worth.

(There's a lot to work out here, in terms of trying to either do the estimates in EV terms, or trying to work out the decision theory / virtue ethics of the matter. I would love to try to discuss in detail, back and forth, and see if we can work this out. I do not think this should be super obvious in either direction from the get go, although at this point my opinion is pretty strongly in the direction I am naming. Please do discuss if you're up for it.)

I feel that 1-2 such posts per organization per year is appropriate and useful, especially since organizations often have year-end reviews or other orienting documents timed near their annual fundraiser, and reading these allows me to get oriented about what the organizations are up to.

Seeing this comment from you makes me feel good about Open Phil's internal culture; it seems like evidence that folks who work there feel free to think independently and to voice their thoughts even when they disagree. I hope we manage a culture that makes this sort of thing accessible at CFAR and in general.

Gotcha. Your phrasing distinction makes sense; I'll adopt it. I agree now that I shouldn't have included "clarity" in my sentence about "attempts to be clear/explainable/respectable".

The thing that confused me is that it is hard to incentivize clarity but not the explainability; the easiest observable is just "does the person's research make sense to me?", which one can then choose how to interpret, and how to incentivize.

It's easy enough to invest in clarity / Motion A without investing in explainability / Motion B, though. My random personal guess is that MIRI invests about half of their total research effort into clarity (from what I see people doing around the office), but I'm not sure (and I could ask the researchers easily enough). Do you have a suspicion about whether MIRI over- or under-invests in Motion A?

I feel as though building a good culture is really quite important, and like this sort of specific proposal & discussion is how, bit by bit, one does that. It seems to me that the default for large groups of would-be collaborators is to waste almost all the available resource due basically to "insufficiently ethical/principled social fabric".

(My thoughts here are perhaps redundant with Owen's reply to your comment, but it seems important enough that I wanted to add a separate voice and take.)

Re: how much this matters (or how much is wasted without this), I like the examples in Eliezer's article on lost purposes or in Scott Alexander's review of house of god.

The larger EA gets, the easier it is for standard failure modes by effort becomes untethered from real progress, or some homegrown analog, to eat almost all our impact as well. And so the more necessary it is that we really seriously try to figure out what principles can keep our collective epistemology truth-tracking.

Not sure how much this is a response to you, but:

In considering whether incentives toward clarity (e.g., via being able to explain one’s work to potential funders) are likely to pull in good or bad directions, I think it’s important to distinguish between two different motions that might be used as a researcher (or research institution) responds to those incentives.

  • Motion A: Taking the research they were already doing, and putting a decent fraction of effort into figuring out how to explain it, figuring out how to get it onto firm foundations, etc.

  • Motion B: Choosing which research to do by thinking about which things will be easy to explain clearly afterward.

It seems to me that “attempts to be clear” in the sense of Motion A are indeed likely to be helpful, and are worth putting a significant fraction of one’s effort into. I agree also that they can be aversive and that this aversiveness (all else equal) may tend to cause underinvestment in them.

Motion B, however, strikes me as more of a mixed bag. There is merit in choosing which research to do by thinking about what will be explainable to other researchers, such that other researchers can build on it. But there is also merit to sometimes attempting research on the things that feel most valuabe/tractable/central to a given researcher, without too much shame if it then takes years to get their research direction to be “clear”.

As a loose analogy, one might ask whether “incentives to not fail” have a good or bad effect on achievement. And it seems like a mixed bag. The good part (analogous to Motion A) is that, once one has chosen to devote hours/etc. to a project, it is good to try to get that project to succeed. The more mixed part (analogous to Motion B) is that “incentives to not fail” sometimes cause people to refrain from attempting ambitious projects at all. (Of course, it sometimes is worth not trying a particular project because its success-odds are too low — Motion B is not always wrong.)

Relatedly, it seems to me that in general, preparadigm fields probably develop faster if:

  1. Different research approaches can compete freely for researchers (e.g., if researchers have secure, institution-independent funding, and can work on whatever approach pleases them). (The reason: there is a strong relationship between what problems can grab a researcher’s interest, and what problems may go somewhere. Also, researchers are exactly the people who have leisure to form a detailed view of the field and what may work. cf also the role of play in research progress.)

  2. The researchers themselves feel secure, and do not need to attempt to optimize for work for “what others will evaluate as useful enough to keep paying me”. (Since such evaluations are unreliable in pre paradigm fields, and since one wants to maximize the odds that the right approach is tried. This security may well increase the amount of non-productivity in the median case, but it should also increase the usefulness of the tails. And the tails are where most of the value is.)

  3. Different research approaches somehow do not need to compete for funding, PR, etc., except via researchers’ choices as to where to engage. There are no organized attempts to use social pressure or similar to override researchers’ intuitions as to where may be fruitful to engage (nor to override research institutions’ choice of what programs to enable, except via the researchers’ interests). (Funders’ intuitions seem less likely to be detailed than are the intuitions of the researcher-on-that-specific-problem; attempts to be clear/explainable/respectable are less likely to pull in good directions.)

  4. The pool of researchers includes varied good folks with intuitions formed in multiple fields (e.g., folks trained in physics; other folks trained in math; other folks trained in AI; some usually bright folks just out of undergrad with less-developed disciplinary prejudices), to reduce the odds of monoculture.

(Disclaimer: I'm on the MIRI board, and I worked at MIRI from 2008-2012, but I'm speaking only for myself here.)

I suspect it’s worth forming an explicit model of how much work “should” be understandable by what kinds of parties at what stage in scientific research.

To summarize my own take:

It seems to me that research moves down a pathway from (1) "totally inarticulate glimmer in the mind of a single researcher" to (2) "half-verbal intuition one can share with a few officemates, or others with very similar prejudices" to (3) "thingy that many in a field bother to read, and most find somewhat interesting, but that there's still no agreement about the value of" to (4) "clear, explicitly statable work whose value is universally recognized valuable within its field". (At each stage, a good chunk of work falls away as a mirage.)

In "The Structure of Scientific Revolutions", Thomas Kuhn argues that fields begin in a "preparadigm" state in which nobody's work gets past (3). (He gives a bunch of historical examples that seem to meet this pattern.)

Kuhn’s claim seems right to me, and AI Safety work seems to me to be in a "preparadigm" state in that there is no work past stage (3) now. (Paul's work is perhaps closest, but there is are still important unknowns / disagreement about foundations, whether it'll work out, etc.)

It seems to me one needs epistemic humility more in a preparadigm state, because, in such states, the correct perspective is in an important sense just not discovered yet. One has guesses, but the guesses cannot be established in common as established knowledge.

It also seems to me that the work of getting from (3) to (4) (or from 1 or 2 to 3, for that matter) is hard, that moving along this spectrum requires technical research (it basically is a core research activity), and one shouldn't be surprised if it sometimes takes years -- even in cases where the research is good. (This seems to me to also be true in e.g. math departments, but to be extra hard in preparadigm fields.)

(Disclaimer: I'm on the MIRI board, and I worked at MIRI from 2008-2012, but I'm speaking only for myself here.)

Load more