Ben Pace

Comments

EA Infrastructure Fund: May 2021 grant recommendations

Yeah, I think you understand me better now.

And btw, I think if there are particular grants that seem not in scope from a fund, is seems totally reasonable to ask them for their reasoning and update pos/neg on them if the reasoning does/doesn't check out. And it's also generally good to question the reasoning of a grant that doesn't make sense to you.

EA Infrastructure Fund: May 2021 grant recommendations

Though it still does seem to me like those two grants are probably better fits for LTFF.

But this line is what I am disagreeing with. I'm saying there's a binary of "within scope" or not, and then otherwise it's up to the fund to fund what they think is best according to their judgment about EA Infrastructure or the Long-Term Future or whatever. Do you think that the EAIF should be able to tell the LTFF to fund a project because the EAIF thinks it's worthwhile for EA Infrastructure, instead of using the EAIF's money? Alternatively, if the EAIF thinks something is worth money for EA Infrastructure reasons, if the grant is probably more naturally under the scope of "Long-Term Future", do you think they shouldn't fund the grantee even if LTFF isn't going to either?

EA Infrastructure Fund: May 2021 grant recommendations

Yeah, that's a good point, that donors who don't look at the grants (or know the individuals on the team much) will be confused if they do things outside the purpose of the team (e.g. donations to GiveDirectly, or a random science grant that just sounds cool), that sounds right. But I guess all of these grants seem to me fairly within the purview of EA Infrastructure?

The one-line description of the fund says:

The Effective Altruism Infrastructure Fund aims to increase the impact of projects that use the principles of effective altruism, by increasing their access to talent, capital, and knowledge.

I expect that for all of these grants the grantmakers think that they're orgs that either "use the principle of effective altruism" or help others do so.

I think I'd suggest instead that weeatquince name some specific grants and ask the fund managers the basic reason for why those grants seem to them like they help build EA Infrastructure (e.g. ask Michelle why CLTR seems to help things according to her) if that's unclear to weeatquince.

EA Infrastructure Fund: May 2021 grant recommendations

The inclusion of things on this list that might be better suited to other funds (e.g the LTFF) without an explanation of why they are being funded from the Infrastructure Fund makes me slightly less likely in future to give directly to the  Infrastructure Fund and slightly more likely to just give to one of the bigger meta orgs you give to (like Rethink Priorities).
 

I think that different funders have different tastes, and if you endorse their tastes you should consider giving to them. I don't really see a case for splitting responsibilities like this. If Funder A thinks a grant is good, Funder B thinks it's bad, but it's nominally in Funder B's purview, this just doesn't seem like a strong arg against Funder A doing it if it seems like a good idea to them. What's the argument here? Why should Funder A not give a grant that seems good to them?

Draft report on existential risk from power-seeking AI

Thanks for the thoughtful reply.

I do think I was overestimating how robust you're treating your numbers and premises, it seems like you're holding them all much more lightly than I think I'd been envisioning.

FWIW I am more interested in engaging with some of what you wrote in in your other comment than engaging on the specific probability you assign, for some of the reasons I wrote about here.

I think I have more I could say on the methodology, but alas, I'm pretty blocked up with other work atm. It'd be neat to spend more time reading the report and leave more comments here sometime.

Draft report on existential risk from power-seeking AI

I tried to look for writing like this. I think that people do multiple hypothesis testing, like Harry in chapter 86 of HPMOR. There Harry is trying to weigh some different hypotheses against each other to explain his observations. There isn't really a single train of conditional steps that constitutes the whole hypothesis.

My shoulder-Scott-Alexander is telling me (somewhat similar to my shoulder-Richard-Feynman) that there's a lot of ways to trick myself with numbers, and that I should only do very simple things with them. I looked through some of his posts just now (1, 2,  3, 4, 5).

Here's an example of a conclusion / belief from Scott's post Teachers: Much More Than You Wanted to Know:

In summary: teacher quality probably explains 10% of the variation in same-year test scores. A +1 SD better teacher might cause a +0.1 SD year-on-year improvement in test scores. This decays quickly with time and is probably disappears entirely after four or five years, though there may also be small lingering effects. It’s hard to rule out the possibility that other factors, like endogenous sorting of students, or students’ genetic potential, contributes to this as an artifact, and most people agree that these sorts of scores combine some signal with a lot of noise. For some reason, even though teachers’ effects on test scores decay very quickly, studies have shown that they have significant impact on earning as much as 20 or 25 years later, so much so that kindergarten teacher quality can predict thousands of dollars of difference in adult income. This seemingly unbelievable finding has been replicated in quasi-experiments and even in real experiments and is difficult to banish. Since it does not happen through standardized test scores, the most likely explanation is that it involves non-cognitive factors like behavior. I really don’t know whether to believe this and right now I say 50-50 odds that this is a real effect or not – mostly based on low priors rather than on any weakness of the studies themselves. I don’t understand this field very well and place low confidence in anything I have to say about it.

I don't know any post where Scott says "there's a particular 6-step argument, and I assign 6 different probabilities to each step, and I trust that outcome number seems basically right". His conclusions read more like 1 key number with some uncertainty, which never came from a single complex model, but from aggregating loads of little studies and pieces of evidence into a judgment.

I think I can't think of a post like this by Scott or Robin or Eliezer or Nick or anyone. But would be interested in an example that is like this (from other fields or wherever), or feels similar.

Draft report on existential risk from power-seeking AI

One thing that I think would really help me read this document would be (from Joe) a sense of "here's the parts where my mind changed the most in the course of this investigation".

Something like (note that this is totally made up) "there's a particular exploration of alignment where I had conceptualized it as kinda like about making the AI think right but now I conceptualize it as about not thinking wrong which I explore in section a.b.c".

Also maybe something like a sense of which of the premises Joe changed his mind on the most – where the probabilities shifted a lot.

Draft report on existential risk from power-seeking AI

I think I share Robby's sense that the methodology seems like it will obscure truth.

That said, I have neither your (Joe) extensive philosophical background nor have spent substantial time like you on a report like this, and I am interested in evidence to the contrary.

To me, it seems like you've tried to lay out a series of 6 steps of an argument, that you think each very accurately carve the key parts of reality that are relevant, and pondered each step for quite a while.

When I ask myself whether I've seen something like this produce great insight, it's hard. It's not something I've done much myself explicitly. However, I can think of a nearby example where I think this has produced great insight, which is Nick Bostrom's work. I think (?) Nick spends a lot of his time considering a simple, single key argument, looking at it from lots of perspectives, scrutinizing wording, asking what people from different scientific fields would think of it, poking and prodding and rotating and just exploring it. Through that work, I think he's been able to find considerations that were very surprising and invalidated the arguments, and proposed very different arguments instead. 

When I think of examples here, I'm imagining that this sort of intellectual work produced the initial arguments about astronomical waste, and arguments since then about unilateralism and the vulnerable world hypothesis. Oh, and also simulation hypothesis (which became a tripartite structure).

I think of Bostrom as trying to consider a single worldview, and find out whether it's a consistent object. One feeling I have about turning it into a multi-step probabilistic argument is that it does the opposite, it does not try to examine one worldview to find falsehoods, but instead integrates over all the parts of the worldview that Bostrom would scrutinize, to make a single clump of lots of parts of different worldviews. I think Bostrom may have literally never published a six-step argument of the form that you have, where it was meant to hold anything of weight in the paper or book, and also never done so assigning each step a probability.

To be clear, probabilistic discussions are great. Talking about precisely how strong a piece of evidence is – is it 2:1, 10:1, 100:1? Helps a lot in noticing which hypotheses to even pay attention to. The suspicion I have is that they are fairly different from the kind of cognition Bostrom does when doing this sort of philosophical argumentation that produces simple arguments of world-shattering importance. I suspect you've set yourself a harder task than Bostrom ever has (a 6-step argument), and thought you've made it easier for yourself by making it only probabilistic instead of deductive, whereas in fact this removes most of the tools that Bostrom was able to use to ensure he didn't take mis-steps.

But I am pretty interested if there are examples of great work using your methodology that you were inspired by when writing this up, or great works with nearby methodologies that feel similar to you. I'd be excited to read/discuss some.

Load More