I'm the Founder and Co-director of The Unjournal; We organize and fund public journal-independent feedback, rating, and evaluation of hosted papers and dynamically-presented research projects. We will focus on work that is highly relevant to global priorities (especially in economics, social science, and impact evaluation). We will encourage better research by making it easier for researchers to get feedback and credible ratings on their work.
Previously I was a Senior Economist at Rethink Priorities, and before that n Economics lecturer/professor for 15 years.
I'm working to impact EA fundraising and marketing; see https://bit.ly/eamtt
And projects bridging EA, academia, and open science.. see bit.ly/eaprojects
My previous and ongoing research focuses on determinants and motivators of charitable giving (propensity, amounts, and 'to which cause?'), and drivers of/barriers to effective giving, as well as the impact of pro-social behavior and social preferences on market contexts.
Podcasts: "Found in the Struce" https://anchor.fm/david-reinstein
and the EA Forum podcast: https://anchor.fm/ea-forum-podcast (co-founder, regular reader)
Twitter: @givingtools
This seems a bit related to the “Pivotal questions”: an Unjournal trial initiative -- we've engaged with a small group of organizations and elicited some of these -- see here.
To highlight some that seem potentially relevant to your ask:
What are the effects of increasing the availability of animal-free foods on animal product consumption? Are alternatives to animal products actually used to replace animal products, and especially those that involve the most suffering? Which plant-based offerings are being used as substitutes versus complements for animal products and why?
Wellbeing measures/how to convert between DALY and WELLBY welfare measurements on assessing charities and interventions.
Is WELLBY the most appropriate (useful, reliable...) measure [for interventions that may have impacts on mental health]
What is cell-cultured meat likely to cost, by year, as a function of the level of investments made?
How often do countries honor their (international) agreements in the event of large catastrophes (and what determines this?)
How probable is it that cell-cultured meat will gain widespread consumer acceptance, and to what timescale? To what extent will consumers replace conventional meat with cell-cultured meat?
How important is democracy for resilience against global catastrophic risk?
How generalizable is evidence on the effectiveness of corporate animal welfare outreach [in the North] to the Global South?
How much will the US government use subjective forecasting approaches (in the way the DoD does) in the next ~50 years?
Thanks for the thoughts. Note that I'm trying to engage/report here because we're working hard to make our evaluations visible and impactful, and this forum seems like one of the most promising interested audiences. But also eager to hear about other opportunities to promote and get engagement with this evaluation work, particularly in non-EA academic and policy circles.
I generally aim to just summarize and synthesize what the evaluators had written and the authors' response, bringing in what seemed like some specific relevant examples, and using quotes or paraphrases where possible. I generally didn't give these as my opinions but rather, the author and the evaluators'. Although I did specifically give 'my take' in a few parts. If I recall my motivation I was trying to make this a little bit less dry to get a bit more engagement within this forum. But maybe that was a mistake.
And to this I added an opportunity to discuss the potential value of doing and supporting rigorous, ambitious, and 'living/updated' meta-analysis here and in EA-adjacent areas. I think your response was helpful there, as was the authors. I'd like to see others' takes
Some clarifications:
The i4replication groups does put out replication papers/reports in each case and submits these to journals, and reports on this outcome on social media . But IIRC they only 'weigh in' centrally when they find a strong case suggesting systematic issues/retractions.
Note that their replications are not 'opt-in': they aimed to replicate every paper coming out in a set of 'top journals'. (And now, they are moving towards a research focusing on a set of global issues like deforestation, but still not opt-in).
I'm not sure what works for them would work for us, though. It's a different exercise. I don't see an easy route towards our evaluations getting attention through 'submitting them to journals' (which naturally, would also be a bit counter to our core mission of moving research output and rewards away from the 'journal publication as a static output.)
Also: I wouldn't characterize this post as 'editor commentary', and I don't think I have a lot of clout here. Also note that typical peer review is both anonymous and never made public. We're making all our evaluations public, but the evaluators have the option to remain anonymous.
But your point about a higher-bar is well taken. I'll keep this under consideration.
A final reflective note: David, I want to encourage you to think about the optics/politics of this exchange from the point of view of prospective Unjornal participants/authors.
I appreciate the feedback. I'm definitely aware that we want to make this attractive to authors and others, both to submit their work and to engage with our evaluations. Note that in addition to asking for author submissions, our team nominates and prioritizes high-profile and potential-high-impact work, and contact authors to get their updates, suggestions, and (later) responses. (We generally only require author permission to do these evaluations from early-career authors at a sensitive point in their career.) We are grateful to you for having responded to these evaluations.
There are no incentives to participate.
I would disagree with this. We previously had author prizes (financial and reputational) focusing on authors who submitted work for our evaluation. although these prizes are not currently active. I'm keen to revise these prizes when the situation permits (funding and partners).
But there are a range of other incentives (not directly financial) for authors to submit their work, respond to evaluations and engage in other ways. I provide a detailed author FAQ here. This includes getting constructive feedback, signaling your confidence in your paper and openness to criticism, the potential for highly positive evaluations to help your paper's reputation, visibility, unlocking impact and grants, and more. (Our goal is that these evaluations will ultimately become the object of value in and of themselves, replacing "publication in a journal" for research credibility and career rewards. But I admit that's a long path.)
I did it because I thought it would be fun ad I was wondering if anyone would have ideas or extensions that improved the paper. Instead, I got some rather harsh criticisms implying we should have written a totally different paper.
I would not characterize the evaluators' reports in this way. Yes, there was some negative-leaning language, which, as you know, we encourage the evaluators to tone down. But there were a range of suggestions (especially from Jané) which I see as constructive, detailed, and useful, both for this paper and for your future work. And I don't see this as them suggesting "a totally different paper." To large extent they agreed with the importance of this project, with the data collected, and with many of your approaches. They praised your transparency. They suggested some different methods for transforming and analyzing the data and interpreting the results.
Then I got this essay, which was unexpected/unannounced and used, again, rather harsh language to which I objected. Do you think this exchange looks like an appealing experience to others? I'd say the answer is probably not.
I think it's important to communicate the results of our evaluations to wider audiences, and not only on our own platform. As I mentioned, I tried to fairly categorize your paper, the nature of the evaluations, and your response. I've adjusted my post above in response to some of your points where there was a case to be made that I was using loaded language, etc.
Would you recommend that I share any such posts with both the authors and the evaluators before making them? It's a genuine question (to you and to anyone else reading these comments) - I'm not sure the correct answer.
As to your suggestion at the bottom, I will read and consider it more carefully -- it sounds good.
Aside: I'm still concerned with the connotation of replication, extension, and robustness checking being something that should be relegated to graduate students and not. This seems to diminish the value and prestige of work that I believe to be of the highest order practical value for important decisions in the animal welfare space and beyond.
In the replication/robustness checking domain, I think what i4replication.org is doing is excellent. They're working with both graduate students and everyone from graduate students to senior professors to do this work and treating this as a high-value output meriting direct career rewards. I believe they encourage the replicators to be fair – excessively conciliatory nor harsh, and focus on the methodology. We are in contact with i4replication.org and hoping to work with them more closely, with our evaluations and “evaluation games” offering grounded suggestions for robustness replication checks.
I meant "constructive and actionable" In that he explained why the practices used in the paper had potentially important limitations (see here on "assigning an effect size of .01 for n.s. results where effects are incalculable")...
And suggested a practical response including a specific statistical package which could be applied to the existing data:
"An option to mitigate this is through multiple imputation, which can be done through the metansue (i.e., meta-analysis of non-significant and unreported effects) package"
In terms of the cost-benefit test it depends on which benefit we are considering here. Addressing these concerns might indeed take months to do and might indeed cost hundreds of hours. Indeed, it's hard to justify this in terms of the current academic/career incentives alone, as the paper had already been accepted for publication. If this we're directly tied to grants there might be a case but as it stands I understand that it could be very difficult for you to take this further.
But I wouldn't characterize doing this as simply "satisfying two critics". The critiques themselves might be sound and relevant, and potentially impact the conclusion (at least in differentiating between "we have evidence," the effects are small and "the evidence is indeterminate", which I think is an important difference). And the value of the underlying policy question (~'Should animal welfare advocates be using funding existing approaches to reducing mep consumption?') seems high to me. So I would suggest that the benefit exceeds the cost here in net even if we might not have a formula for making it worth your while to make these adjustments right now.
I also think there might be value in setting an example standard that, particularly for high-value questions like this, we strive for a high level of robustness, following up on a range of potential concerns and critiques etc. I'd like to see these things as long-run living projects that can be continuously improved and updated (and re-evaluated). The current research reward system doesn't encourage this, which is a gap we are trying to help fill.
Thanks for the detailed feedback, this seems mostly reasonable. I'll take a look again at some of the framings, and try to adjust. (Below and hopefully later in more detail).
the phrase "this meta-analysis is not rigorous enough". it seems this meta-analysis is par for the course in terms of quality.
This was my take on how to succinctly depict the evaluators' reports (not my own take), in a way the casual reader would be able to digest. Maybe this was rounding down too much, but not by a lot, I think. Some quotes from Janés evaluation that I think are representative:
Overall, aside from its commendable transparency, the meta-analysis is not of particularly high quality Overall, the transparency is strong, but the underlying analytic quality is limited.
This doesn't seem to reflect 'par for the course' to me, but it depends on what the course is; i.e., what the comparison group. My own sense/guess is that this more rigorous and careful than most work in this area of meat consumption interventions (and adjacent) but less rigorous than the meta-analyses the evaluators are used to seeing in their academic contexts and the practices they espouse. But academic meta-analysts will tend to focus on areas where they can find a proliferation of high-quality more homogenous research, not necessarily the highest impact areas.
Note that the evaluators rated this 40th and 25th percentile for methods and 75th and 39th percentile overall.
And the central claim of the meta-analysis doesn't seem like something either evaluator disputed (though one evaluator was hesitant).
To be honest I'm having trouble pinning down what the central claim of the meta-analysis is. Is it a claim that "the main approaches being used to motivate reduced meat consumption don't seem to work", i.e., that we can bound the effects as very small, at best? That's how I'd interpret the reporting of the pooled effects 95% CI as standardized mean effect of 0.02 and 0.12. I would say that both evaluators are sort of disputing that claim.
However the authors hedge this in places and sometimes it sounds more like they're saying that ~"even the best meta-analysis possible leaves a lot of uncertainty" ... An absence of evidence more than an evidence of absence, and this is something the evaluators seem to agree with.
Finally, one piece of context that felt odd not to mention was the fundamental difficulty of finding an expert in both food consumption and meta-analysis.
That is/was indeed challenging. Let me try to adjust this post to note that.
a few editorial choices ... make the Forum post seem much more critical than the evaluation reports, which is a bit unusual since the Evaluators are the ones who are actually critiquing the paper.
My goal for this post was to fairly represent the evaluator's take, to provide insights to people who might want to use this for decision-making and future research, to raise the question of standards in meta-analysis in EA-related areas. I will keep thinking about whether I missed the mark here. One possible clarification though: we don't frame the evaluator's role as (only) looking to criticize or find errors in the paper. We ask them to give a fair assessment of it, evaluating its strengths, weaknesses, credibility, and usefulness. These evaluations can also be useful if they give people more confidence in the paper and its conclusions, and thus reason to update more on this for their own decision-making.
I'll frame it explicitly here: when we did one check and not another, or one one search protocol and not another, the reason, every single time, is opportunity costs. When I write: "we thought it made more sense to focus on the risks of bias that seemed most specific to this literature," notice the word 'focus', which means saying no.
That is clearly the case, and I accept there are tradeoffs. But ideally I would have liked to see a more direct response to the substance of the points made by the evaluators. But I understand that there are tradeoffs there as well.
In other words, because of opportunity costs, we are always triaging. At every juncture, navigating the explore/exploit dilemma requires judgment calls. You don't have to like that I said no to you, but it's not a false dichotomy, and I do not care for that characterization.
Perhaps 'false dichotomy' was too strong, given the opportunity costs (not an excuse: I got that from the RoastMyPost's take on this). But as I understand it there are clear rubrics and guidelines for this meta-analyses. In cases where you choose to depart from the standard practice, maybe it's reasonable to give a more detailed and grounded explanation of why you did this. And the evaluators did present very specific arguments for different practices you could have followed and could still follow in future work. I think judgment calls based on experience gets you somewhere but it would be better to explicitly defend why you made a particular judgment call, and respond to and consider the analytical points made by the evaluators. And ideally follow up with the checks they suggest, although I understand that it's hard to do this given how busy you are and the nature of academic incentives.
I hope I am being fair here; I'm trying to be even-handed and sympathetic to both sides. Of course, for this exercise to be useful, we have to allow for and permit constructive expert criticism; which I think these evaluations do indeed embody. I appreciate you having responded to these at all. I'd be happy to get others' opinions on whether we've been fair here.
To the second question of whether anyone will do the kind of extension work, I personally see this as a great exercise for grad students. I did all kinds of replication and extension exercises in grad school. A deep dive into a subset of contact hypothesis literature I did in a political psychology class in 2014 , which started with a replication attempt, eventually morphed into The Contact Hypothesis Re-evaluated.
If a grad student wanted to do this kind of project, please be in touch, I'd love to hear from you.
I had previously responded "casting this as 'for graduate students" makes it seem less valuable and prestigious," which I still stand by. But I appreciate that you adjusted your response to note "If a grad student wanted to do this kind of project, please be in touch, I'd love to hear from you" which I think helps a lot.
The point I was making -- perhaps preaching to the choir here:
These extensions and replication, and follow-up steps may be needed to a large project deeply credible and useful and to capture a large part of the value. Why not give equal esteem and career rewards for that? The current system of journals tends not to do so (at least not in economics, the field I'm most familiar with). This is one of the things that we hope that credible evaluation separated from journal publications can improve upon.
This does indeed look interesting, and promising. Some quick (maybe naive) thoughts on that particular example, at a skim.
The "cost of convincing researchers to work on it" Is uncertain to me. If it was already a very well-funded high-quality study in an interesting area that is 'likely to publish well' (apologies), I assume that academics would have some built-in 'publish or perish' incentives from their universities.
Certainly there is some trade-off here: Of course investing resources, intellectual and time, into more careful, systematic, and robust meta-analysis of a large body of work of potentially varying quality and great heterogeneity comes at the cost of academics and interested researchers organizing better and more systematic new studies. There might be some middle ground where a central funder requires future studies to follow common protocols and reporting standards to enable better future meta-analysis (perhaps as well as outreach to authors of past research to try systematically dig out missing information.)
Seems like there are some key questions here
Post roasted here on roastmypost (Epistemic Audit)
It gets a B- which seems to be the modal rating.
Some interesting comments (going in far more detail than the summary below)
This EA Forum post announces Joe Carlsmith's career transition from Open Philanthropy to Anthropic while providing extensive justification for working at a frontier AI company despite serious safety concerns. The document demonstrates exceptional epistemic transparency by explicitly acknowledging double-digit extinction probabilities while defending the decision on consequentialist grounds. The analysis reveals sophisticated reasoning about institutional impact, though it contains notable tensions between stated beliefs (no adequate safety plan exists, no company should impose such risks) and actions (joining Anthropic anyway). The document's greatest epistemic virtue is its unflinching acknowledgment of catastrophic risks; its primary weakness is underexploring how individual rationalization might systematically lead safety-concerned researchers to converge on similar justifications for joining labs they believe pose existential threats.
Project Idea: 'Cost to save a life' interactive calculator promotion
What about making and promoting a ‘how much does it cost to save a life’ quiz and calculator.
This could be adjustable/customizable (in my country, around the world, of an infant/child/adult, counting ‘value added life years’ etc.) … and trying to make it go viral (or at least bacterial) as in the ‘how rich am I’ calculator?
The case
While GiveWell has a page with a lot of tech details, but it’s not compelling or interactive in the way I suggest above, and I doubt they market it heavily.
GWWC probably doesn't have the design/engineering time for this (not to mention refining this for accuracy and communication). But if someone else (UX design, research support, IT) could do the legwork I think they might be very happy to host it.
It could also mesh well with academic-linked research so I may have some ‘Meta academic support ads’ funds that could work with this.
Tags/backlinks (~testing out this new feature)
@GiveWell @Giving What We Can
Projects I'd like to see
EA Projects I'd Like to See
Idea: Curated database of quick-win tangible, attributable projects