A simple, inexpensive, relatively easy step that EA organizations could take to improve their research quality is to submit their paper drafts to a paid peer review service at an academic publisher like Wiley. Wiley charges $200-300 for a 10-day turnaround. That’s assuming the draft is no more than 5,000 words — longer drafts cost more to review.
Wiley’s service is not the only one like this, or necessarily the best, it was just the first I came across. I also can’t personally attest to the quality of Wiley’s service (or any of the other similar services). I hope that at least one of these companies offers a good service, but I don’t know for sure that they do.
Another thought: it could potentially be disheartening to get this kind of feedback at the end of a research project, when the paper is almost ready to post online. So, maybe it would be even better to get input from experts in relevant fields at the earliest stage. Experts could review your research proposal and offer input, potentially saving you tons of time and heartache if you were about to make an avoidable error.
For example, if METR had gotten a research proposal for its AI time horizons work reviewed by some external experts, there are some avoidable errors in that work that potentially could have been averted. More discouragingly, but still important, if METR had submitted the draft of its paper on the time horizons work to a peer review service prior to posting it on its website, the paper could have better disclosed some of the errors and limitations.
Peer review of both research proposals and paper drafts would be useful for two major reasons. First, it would be intrinsically useful because it would lead to better research. If the point of research is to tell us the truth, and we want to know the truth, well, then, better research will tell us the truth better.
Second, it would be instrumentally useful. An important goal for many EA organizations is to persuade a broader community of people about something — experts, policymakers, regulators, the general public, potential recruits to the EA movement. Higher-quality research is more persuasive. It’s also a good way to earn credibility and trust. Low-quality research is unpersuasive, and can even persuade people in the opposite direction. (“If that’s the best you could come up with, surely your conclusions must be wrong!”) Publishing low-quality or fatally erroneous research also damages credibility and trust.
Assuming Wiley’s service is as useful as I hope, $200-300 and 10 days of waiting is a tiny cost compared to the intrinsic and instrumental value of doing better research.
Two potential subcultural stumbling blocks:
- There is a strong undercurrent in the EA community of opposition to mainstream institutions — mainstream journalism, mainstream academia, and mainstream, institutional science. Even to mainstream society and culture.
- Not unrelatedly, there is a strong desire in the EA community to treat the community as an enclave (or conclave!), rather than a part of the wider world. For EA to rely only on itself for ideas, for input, for intellectual evaluation.
I probably can’t convince anyone that these attitudes are wrong for intrinsic epistemic reasons. But maybe I can convince them that, in order to have a strong and durable influence on the wider world, it will be necessary for EA organizations to “play ball” and engage with the rest of the world on its terms.
The EA community has certain beliefs, particularly about how close the world is to creating AGI, that most experts, forecasters, policymakers, and members of the general public disagree with. Some EA organizations just want to do technical research and don’t need to worry about what anyone else thinks. But other organizations want to persuade the world of the danger.
Maybe some people feel cynical and don’t dare hope that the world could actually be persuaded on the basis of high-quality scientific evidence. Although scientific thinking and Enlightenment values are embattled, and there is a lot of misinformation out there, I still think scientific evidence matters a lot to a lot of people, including experts, policymakers, and the general public. The world is open to being persuaded. But you have to “play ball”.
(Related post here.)

I would have found this much more persuasive if you'd tried these services yourself and found them valuable. Without that, my median expectation is that they will do a worse job than Claude Opus 4.7.
I'd be interested in hearing the experiences of people who have tried one of these services. I hope they're good, but I don't know that they are. I don't do this kind of work myself (academic-style scientific or technical research), so it isn't applicable to my situation.
A digression on whether you should rely on Claude to do peer review. I found some funny and striking examples to demonstrate the perils of relying on LLM chatbots for this sort of thing:
These were cases where I suspected it would probably give ridiculously high probabilities, and I chose questions unflattering to EA because people in the EA community would be less likely to accept the chatbot's answers. I also asked it a flattering question though:
I tried the same prompt three times and ChatGPT gave probabilities of 3%, 0.1%, and 5%. Again, just ridiculously high probabilities.[1]
In the course of organically using ChatGPT and Google Gemini, I've also encountered tons of weird behaviours. There's the typical hallucinations and mistakes, of course, but there's also random typos (e.g. "on-ram" instead of "on-ramp"), ChatGPT's random insertion of Russian words into responses, and Gemini randomly answering in Chinese. GPT-5.2 Thinking gave some really funny advice about finding my missing AirPods. One of the craziest was when I asked GPT-5.4 Thinking (with "Extended thinking") to do a simple time zone conversion. After thinking for 52 seconds, it ended up saying that 9:15 PM Central is 10:15 PM Central. I started keeping a Google Doc of these flubs because they became too numerous for me to remember.
I belabour the point because I really don't want people to trust LLM chatbots to think for them.
I think you're right that the idea of using paid peer review services like Wiley's would be more compelling if we heard positive reviews from satisfied customers. This is worth looking into further.
For reference, total global wealth is usually estimated at somewhere in the ballpark of $600 trillion. Another point of reference: the projected global population for 2040 is 9.2 billion people. Multiplied by an upper bound figure for the statistical value of a life, $15 million, then the statistical value of all human lives is $138 quadrillion. Still not even close to 1 quintillion.
Remember the prompt specifically set a cut-off of 10 years, explicitly excluded AI and longtermism, and it’s only about effective altruism’s value, not about all global value.