Request for comments: EA Projects evaluation platform

Jan_Kulveit

I'm a big fan of the idea of having a new EA projects evaluation pipeline. Since I view this as an important idea, I think it's important to get the plan to the strongest point that it can be. From my perspective, there are only a smallish number of essential elements for this sort of plan. It needs a submissions form, a detailed RFP, some funders, and some evaluators. Instead, we don't yet have these (e.g. detail re desired projects, consultation with funders). But then I'm confused about some of the other things that are emphasised: large initial scale, a process for recruiting volunteer-evaluators, and fairly rigid evaluation procedures. I think the fundamentals of the idea are strong enough that this still has a chance of working, but I'd much prefer to see the idea advanced in its strongest possible form. My previous comments on this draft are pretty similar to Oliver's, and here are some of the main ones:

This makes sense to me as an overall idea. I think this is the sort of project where if you do it badly, it might dissuade others from trying the same. So I think it is worth getting some feedback on this from other evaluators (BERI/Brendon Wong). It would also probably be useful to get feedback from 1-2 funders (maybe Matt Wage? Maybe someone from OpenPhil?), so that you can get some information about whether they think your evaluation process would be of interest to them, or what might make it so. It could also be useful to have unofficial advisors.

I predict the process could be refined significantly with ~3 projects.

You only need a couple of volunteers and you know perhaps half of the best candidates, so for the purpose of a pilot, did you consider just asking a couple of people you know to do it?

I think you should provide a ~800 word request for proposals. Then you can give a much more detailed description of who you want to apply. e.g. just longtermist projects? How does this differ from the scope of EA grants, BERI, OpenPhil, etc etc? Is it sufficient to apply with just an idea? Do you need a team? A proof of concept? etc etc etc.

This would be strengthened somewhat by already having obtained the evaluators, but this may not be important.

John_Maxwell

Since I view this as an important idea, I think it's important to get the plan to the strongest point that it can be.

It's also important not to let the perfect be the enemy of the good. Seems to me like people are always proposing volunteer-lead projects like this and most of them never get off the ground. Remember this is just a pilot.

I think this is the sort of project where if you do it badly, it might dissuade others from trying the same.

The empirical reality of the EA project landscape seems to be that EAs keep stumbling on the same project ideas over and over with little awareness of what has been proposed or attempted in the past. If this post goes like the typical project proposal post, nothing will come of it, it will soon be forgotten, and 6 months later someone will independently come up with a similar idea and write a similar post (which will meet a similar fate).

John_Maxwell

As a concrete example of this "same project ideas over and over with little awareness of what has been proposed or attempted in the past" thing, https://lets-fund.org is a fairly recent push in the "fund fledgling EA projects" area which seems to have a decent amount of momentum behind it relative to the typical volunteer-lead EA project. What are the important differences between Let's Fund and what Jan is working on? I'm not sure. But Let's Fund hasn't hit the $75k target for their first project, even though it's been ~5 months since their launch.

The EA Hotel is another recent push in the "fund fledgling EA projects" area which is struggling to fundraise. Again, loads of momentum relative to the typical grassroots EA project--they've bought a property and it's full of EAs. What are the relative advantages & disadvantages of the EA Hotel, Let's Fund, and Jan's thing? How about compared with EA Funds? Again, I'm not sure. But I do wonder if we'd be better off with "more wood behind fewer arrows", so to speak.

Jan_Kulveit

-2

On a meta-level

I'm happy to update the proposal to reflect some of the sentiments. Openly, I find some of them quite strange - e.g. it seems, coalescing the steps into one paragraph and assuming all the results (reviews, discussion, "authoritative" summary of the discussion) will just happen may make it look more flexible. Ok, why not.

Also it seems you and Oli seem to be worried that I want to recruit people who are currently not doing some high-impact direct work ... instead of just asking a couple of people around me, which would often mean people already doing impactful volunteer work.

Meta-point is, I'm not sure if you or Oli realize how big part of solving

new EA projects evaluation pipeline

is in consensus-building. Actually I think the landscape of possible ways how to do evaluations looks like in such a way that it is very hard to get consensus on what the "strongest form" is. I'm quite happy to create a bunch of proposals, e.g.

with removing final expert evaluation
removing initial reviews
removing public forum discussions
writing an unrealistic assumption that the initial reviews will take 15m instead of hours,
suggesting that the volunteers will be my busy friends (whose voluntary work does not count?)
emphasising public feedback more, or less
giving stronger or weaker voice to existing funders.

I have stronger preference for the platform to happen than for one option in any single of these choices. But what is the next step? After thinking about the landscape for a some time I'm quite skeptical any particular combination of options would not have some large drawback.

On the object level:

Re: funder involvement

Cross-posting from another thread

Another possible point of discussion is whether the evaluation system would work better if it was tied to some source of funding. My general intuition is this would create more complex incentives, but generally I don't know and I'm looking for comments.

I think it much harder to give open feedback if it is closely tied with funding. Feedback from funders can easily have too much influence on people, and should be very careful and nuanced, as it comes from some position of power. I would expect adding financial incentives can easily be detrimental for the process. (For self-referential example, just look on this discussion: do you think the fact that Oli dislikes my proposal and suggest LTF can back something different with $20k will not create at least some unconscious incentives?)

We had some discussion with Brendon, and I think his opinion can be rounded to "there are almost no bad projects, so to worry about them is premature". I disagree with that. Also, given the Brendon's angel group is working, evaluating and funding projects since October, I would be curious what projects were funded, what was the total amount of funding allocated, how many applications they got.

Based on what I know I'm unconvinced that Brendon or BERI should have some outsized influence how evaluations should be done; part of the point of the platform would be to serve broader community.

Brendon_Wong

We had some discussion with Brendon, and I think his opinion can be rounded to "there are almost no bad projects, so to worry about them is premature". I disagree with that.

I do not think your interpretation of my opinion on bad projects in EA is aligned with what I actually believe. In fact, I actually stated my opinion in writing in a response to you two days ago which seems to deviate highly from your interpretation of my opinion.

I never said that there are "almost no bad projects." I specifically said I don't think that "many immediately obvious negative EV projects exist." My main point was that my observations of EA projects in the entire EA space over the last five years do not line up with a lot of clearly harmful projects floating around. This does not preclude the possibility of large numbers of non-obviously bad projects existing, or small numbers of obviously bad projects existing.

I also never stated anything remotely similar to "to worry about [bad projects] is premature." In fact, my comment said that the EA Angel Group helps prevent the "risk of one funder making a mistake and not seeking additional evaluations from others before funding something" because there is "an initial staff review of projects followed by funders sharing their evaluations of projects with each other to eliminate the possibility of one funder funding something while not being aware of the opinion of other funders."

I believe that being attentive to the risks of projects is important, and I also stated in my comment that risk awareness could be of even higher importance when it comes to projects that seek to impact x-risks/the long-term future, which I believe is your perspective as well.

Also, given the Brendon's angel group is working, evaluating and funding projects since October, I would be curious what projects were funded, what was the total amount of funding allocated, how many applications they got.

Milan asked this question and I answered it.

Based on what I know I'm unconvinced that Brendon or BERI should have some outsized influence how evaluations should be done; part of the point of the platform would be to serve broader community.

I'm not entirely sure what your reasons are for having this opinion, or what you even mean. I am also not exactly sure what you define as an "evaluation." I am interpreting evaluations to mean all of the assessments of projects happening in the EA community from funders or somewhat structured groups designed to do evaluations.

I can't speak for BERI, but I currently have no influence on how evaluations should be done, and I also currently have no interest in influencing how evaluations should be done. My view on evaluations seems to align with Oliver Habryka's view that "in practice I think people will have models that will output a net-positive impact or a net-negative impact, depending on certain facts that they have uncertainty about, and understanding those cruxes and uncertainties is the key thing in understanding whether a project will be worth working on." I too believe this is how things work in practice, and evaluation processes seem to involve one or more people, ideally with diverse views and backgrounds, evaluate a project, sometimes with a more formalized evaluation framework taking certain factors into account. Then, a decision is made, and the process repeats at various funding entities. Perhaps this could be optimized by having argument maps or a process that involves more clearly laying out assumptions and assigning mathematical weights to them, but I currently have no plans to try to go to EA funders and suggest they all follow the same evaluation protocol. Highly successful for-profit VCs employ a variety of evaluation models and have not converged on a single evaluation method. This suggests that perhaps evaluators in EA should use different evaluation protocols since different protocols might be more or less effective with certain cause areas, circumstances, types of projects, etc.

John_Maxwell

I actually stated my opinion in writing in a response to you two days ago which seems to deviate highly from your interpretation of my opinion.

I think I've seen forum discussions where language has been an unacknowledged barrier to understanding in the past, so it might be worth flagging that Jan is from the Czech Republic and likely does not speak English as his mother tongue.

Brendon_Wong

Thanks for pointing that out! Jan and I have also talked outside the EA Forum about our opinions on risk in the EA project space. I’ve been more optimistic about the prevalence of negative EV projects, so I thought there was a chance that greater optimism was being misinterpreted as a lack of concern about negative EV projects, which isn’t my position.

Jan_Kulveit

My impression was based mostly on our conversations several months ago - quoting the notes from that time

lot of the discussion and debate derives from differing assumptions held by the participants regarding the potential for bad/risky projects: Benjamin/Brendon generally point out the lack of data/signal in this area and believe launching an open project platform could provide data to reduce uncertainty, whereas Jan is more conservative and prioritizes creating a rigorous curation and evaluation system for new projects.

I think it is fair to say you expected very low risk from creating an open platform where people would just post projects and seek volunteers and funding, while I expected with minimum curation this creates significant risk (even if the risk is coming from small fraction of projects). Sorry if I rounded off suggestions like "let's make an open platform without careful evaluation and see" and "based on the project ideas lists which existed several years ago the amount of harmful projects seems low" to "worrying about them is premature".

Reading your recent comment, it seems more careful, and pointing out large negative outcomes are more of a problem with x-risk/long-term oriented projects.

In our old discussions I also expressed some doubt about your or altruism.vc ability to evaluate x-risk and similar projects, where your recent post states that projects that impact x-risks by doing something like AI safety research has not yet applied to the EA Angel Group.

I guess part of the disagreement comes from the fact that I have focus on x-risk and the long-term future, and I'm more interested both in improving the project landscape in these areas, and more worried about negative outcomes.

If open platforms or similar evaluation process also accept mitigating x-risk and similar proposals, in my opinion, unfortunately the bar how good/expert driven evaluations you need is higher, and unfortunately signals like "this is a competent team" which VCs would mainly look at are not enough.

Because I would expect the long-term impact will come mainly from long-term, meta-, exploratory or very ambitious projects, I think you can be basically right about low obvious risk of all the projects historically posted on hackpad or proposed to altruism.vc, and still miss the largest term in the EV.

Milan asked this question and I answered it.

Thanks - both of that happened after I posted my comment, and also I still do not see the numbers which would help me estimate the ratio of projects which applied and which got funded. I take as mildly negative signal that someone had to ask, and this info was not included in the post, which solicits project proposals and volunteer work.

In my model it seems possible you have something like chicken-and-egg problem, not getting many great proposals, and the group of unnamed angels not funding many proposals coming via that pipeline.

If this is the case and the actual number of successfully funded projects is low, I think it is necessary to state this clearly before inviting people to work on proposals. My vague impression was we may disagree on this, which seems to indicate some quite deep disagreement about how funders should treat projects.

I'm not entirely sure what your reasons are for having this opinion, or what you even mean

The whole context was, Ryan suggested I should have sought some feedback from you. I actually did that, and your co-founder noted that he will try to write the feedback on this today or tomorrow, on 11th of Mar - which did not happen. I don't think this is large problem, as we had already discussed the topic extensively.

When writing it I was somewhat upset about the mode of conversation where critics do ask whether I tried to coordinate with someone, but just assume I did not. I apologize for the bad way it was written.

Overall my summary is we probably still disagree in many assumptions, we did invest some effort trying to overcome them, it seems difficult for us to reach some consensus, but this should not stop us trying to move forward.

Brendon_Wong

I think it is fair to say you expected very low risk from creating an open platform where people would just post projects and seek volunteers and funding, while I expected with minimum curation this creates significant risk (even if the risk is coming from small fraction of projects). Sorry if I rounded off suggestions like "let's make an open platform without careful evaluation and see" and "based on the project ideas lists which existed several years ago the amount of harmful projects seems low" to "worrying about them is premature".

The community has already had many instances of openly writing about ideas, seeking funding on the EA Forum, Patreon, and elsewhere, and posting projects in places like the .impact hackpad and the currently active EA Work Club. Since posting about projects and making them known to community members seems to be a norm, I am curious about your assessment of the risk and what, if anything, can be done about it.

Do you propose that all EA project leaders seek approval from a central evaluation committee or something before talking with others about and publicizing the existence of their project? This would highly concern me because I think it's very challenging to predict the outcomes of a project, which is evidenced by the fact that people have wildly different opinions on how good of an idea or how good of a startup something is. Such a system could be very negative EV by greatly reducing the number of projects being pursued by providing initial negative feedback that doesn't reflect how the project would have turned out or decreasing the success of projects because other people are afraid to support a project that did not get backing from an evaluation system. I expect significant inaccuracy from my own project evaluation system as well as the project evaluation systems of other people and evaluation groups.

Thanks - both of that happened after I posted my comment, and also I still do not see the numbers which would help me estimate the ratio of projects which applied and which got funded. I take as mildly negative signal that someone had to ask, and this info was not included in the post, which solicits project proposals and volunteer work.

In my model it seems possible you have something like chicken-and-egg problem, not getting many great proposals, and the group of unnamed angels not funding many proposals coming via that pipeline.

If this is the case and the actual number of successfully funded projects is low, I think it is necessary to state this clearly before inviting people to work on proposals. My vague impression was we may disagree on this, which seems to indicate some quite deep disagreement about how funders should treat projects.

I wrote about the chicken and the egg problem here. As noted in my comments on the announcement post, the angels have significant amounts of funding available. Other funders do not disclose some of these statistics, and while we may do so in the future, I do not think it is necessary before soliciting proposals. The time cost of applying is pretty low, particularly if people are recycling content they have already written. I think we are the first grantmaking group to give all applicants feedback on their application which I think is valuable even if people do not get funded.

The whole context was, Ryan suggested I should have sought some feedback from you. I actually did that, and your co-founder noted that he will try to write the feedback on this today or tomorrow, on 11th of Mar - which did not happen. I don't think this is large problem, as we had already discussed the topic extensively.

Ben commented on your Google Document that was seeking feedback. I wouldn't say we've discussed the topic "extensively" in the brief call that we had. The devil is in the details, as they say.

RyanCarey

This is an uncharitable reading of my comment in many ways.

First, you suggest that I am worried that you want to recruit people not currently doing direct work. All things being equal, of course I would prefer to recruit people with fewer alternatives. But all things are not equal. If you use people you know for the initial assessments, you will much more quickly be able to iron out bugs in the process. In the testing stages, it's best to have high-quality workers that can perceive and rectify problems, so this is a good use of time for smart, trusted friends, especially since it can help you postpone the recruitment step.

Second, you suggest that I am in the dark about the importance of consensus-building. But this assumes that I believe the only use for consultation is to reach agreement. Rather, by talking to the groups working in related spaces like BERI, Brendon, EA grants, EA funds, and donors, you will of course learn some things, and your beliefs will probably get closer. On aggregate, your process will improve. But also you will build a relationship that will help you to share proposals (and in my opinion funders).

Third, you raise the issue of connecting funding with evaluation. Of course, the distortionary effect is significant. I happen to think the effect from creating an incentive for applicants to apply is larger and more important, and funders should be highly engaged. But there are also many ways that you could have funders be moderately engaged. You could check what would be a useful report for them, that would help them to decide to fund something. You could check what projects they are more likely to fund.

The more strategic issue is as follows. Consensus is hard to reach. But a funding platform is a good that scales with the size of the network of applicants (and imo funders). Somewhat of a natural monopoly (although we want there to be at least a few funders.) You eventually want widespread community-support of some form. I think that as you suggest, that means we need some compromise, but I think it also weighs in favour of more consultation, and in favour of a more experimental approach, which projects are started in a simple form.

Jan_Kulveit

It is possible my reading of your post somewhat blended with some other parts of the discussion, which are in my opinion quite uncharitable reading of the proposal. Sorry for that.

Actually from the list, I talked about it and shared the draft with people working on EA grants, EA funds, and Brendon, and historically I had some interactions with BERI. What I learned is people have different priors over existence of bad projects, ratio of good projects, number of projects which should or should not get funded. Also opinions of some of the funders are at odds with opinions of some people I trust more than the funders.

I don't know, but it seems to me you are either a bit underestimating the amount of consultation which went into this, or overestimating how much agreement is there between the stakeholders. Also I'm trying to factor in the interests of the project founders, and overall I'm more concerned whether the impact in the world would be good, and what's good for the whole system.

Despite repeated claims the proposal is very heavy, complex, rigid, etc. I think the proposed project would be in fact quite cheap, lean, and flexible (and would work). I'm also quite flexible in modifying it in any direction which seems consensual.

D_M_x

I think it much harder to give open feedback if it is closely tied with funding. Feedback from funders can easily have too much influence on people, and should be very careful and nuanced, as it comes from some position of power. I would expect adding financial incentives can easily be detrimental for the process. (For self-referential example, just look on this discussion: do you think the fact that Oli dislikes my proposal and suggest LTF can back something different with $20k will not create at least some unconscious incentives?)

I'm a bit confused here. I think I disagree with you, but maybe I am not understanding you correctly.

I consider having people giving feedback to have 'skin in the game' to be important for the accuracy of the feedback. Most people don't enjoy discouraging others they have social ties with. Often reviewers without sufficient skin in the game might be tempted to not be as openly negative about proposals as they should be.

Funders instead can give you a strong signal - a signal which is unfortunately somewhat binary and lacks nuance. But someone being willing to fund something or not is a much stronger signal for the value of a proposal than comments from friends on a GoogleDoc. This is especially true if people proposing ideas don't take into account how hard it is to discourage people and don't interpret feedback in that light.

John_Maxwell

I consider having people giving feedback to have 'skin in the game' to be important for the accuracy of the feedback. Most people don't enjoy discouraging others they have social ties with. Often reviewers without sufficient skin in the game might be tempted to not be as openly negative about proposals as they should be.

Maybe anonymity would be helpful here, the same way scientists do anonymous peer review?

Jan_Kulveit

I'm not sure if we agree or disagree, possibly we partially agree, partially disagree. In case of negative feedback, I think as a funder, you are in greater risk of people over-updating in the direction "I should stop trying".

I agree friends and social neighbourhood may be too positive (that's why the proposed initial reviews are anonymous, and one of the reviewers is supposed to be negative).

When funders give general opinions on what should or should not get started or how you value or not value things, again, I think you are at greater risk of having too much of an influence on the community. I do not believe the knowledge of the funders is strictly better than the knowledge of grant applicants.

D_M_x

(I still feel like I don’t really understand where you’re coming from.)

I am concerned that your model of how idea proposals get evaluated (and then plausibly funded) is a bit off. From the original post:

hard to evaluate which project ideas are excellent , which are probably good, and which are too risky for their estimated return.

You are missing one major category here: projects which are simply bad because they do have approximately zero impact, but aren't particularly risky. I think this category is the largest of the the four.

Which projects have a chance of working and which don't is often pretty clear to people who have experience evaluating projects quite quickly (which is why Oli suggested 15min for the initial investigation above). It sounds to me a bit like your model of ideas which get proposed is that most of them are pretty valuable. I don't think this is the case.

When funders give general opinions on what should or should not get started or how you value or not value things, again, I think you are at greater risk of having too much of an influence on the community. I do not believe the knowledge of the funders is strictly better than the knowledge of grant applicants.

I am confused by this. Knowledge of what?

The role of funders/evaluators is to evaluate projects (and maybe propose some for others to do). To do this well they need to have a good mental map of what kind of projects have worked or not worked in the past, what good and bad signs are, ideally from an explicit feedback loop from funding projects and then seeing how the projects turn out. The role of grant applicants is to come up with some ideas they could execute. Do you disagree with this?

Jan_Kulveit

You are missing one major category here: projects which are simply bad because they do have approximately zero impact, but aren't particularly risky. I think this category is the largest of the the four.

I agree that's likely. Please take the first paragraphs more as motivation than precise description of the categories.

Which projects have a chance of working and which don't is often pretty clear to people who have experience evaluating projects quite quickly (which is why Oli suggested 15min for the initial investigation above).

I think we are comparing apples and oranges. As far as the output should be some publicly understandable reasoning behind the judgement, I don't think this is doable in 15m.

It sounds to me a bit like your model of ideas which get proposed is that most of them are pretty valuable. I don't think this is the case.

I don't have strong prior on that.

To do this well they need to have a good mental map of what kind of projects have worked or not worked in the past,...

From a project-management perspective, yes, but with slow and bad feedback loops in long-term, x-risk and meta oriented projects, I don't think it is easy to tell what works and what does not. (Even with projects working in the sense they run smoothly and are producing some visible output.)

Request for comments: EA Projects evaluation platform

Request for comments: EA Projects evaluation platform

Evaluation process

Why this particular evaluation process

Evaluators

Projects