All of david_reinstein's Comments + Replies

Thank you, this is the correct link: https://unjournal.pubpub.org/pub/evalsumleadexposure/

I need to check what's going on with our DOIs !

I think you can basically already do this in at least some online supermarkets like Ocado in the UK 

https://www.ocado.com/categories/dietary-lifestyle-world-foods/vegan/213b8a07-ab1f-4ee5-bd12-3e09cb16d2f6?source=navigation  

Is that different than what you are proposing or do you just propose extending it to more online supermarkets? 

 

3
Julia_Wise🔸
In the US, Instacart has a "dietary preferences" setting where you can opt to have more shown to you from categories like vegan, vegetarian, organic, etc. But when I tried it, it seemed to show me basically the same as usual.
2
Luzia
I'm not sure whether these have been improving a lot over time but I feel like they usually miss a lot of items that are vegan? I was shopping with Ocado every week up until October last year and I never found the filter to be very good so I'd still check ingredients myself.
6
Tom Cohen Ben-Arye
Great point—thank you for raising it. Yes, Ocado is one of the few strong real-world examples of what we’re proposing. Similarly, Albert Heijn, the largest supermarket chain in the Netherlands, offers a comparable vegan filter and is often cited as a key contributor to the country’s high adoption of plant-based diets. We summarized the Albert Heijn case here: https://docs.google.com/document/d/1Sc1Yun2HXjPx-7jJVrs-lKDzTrlI10Rk4k_5q4ZOs9M/edit The key point, though, is that cases like Ocado and Albert Heijn are exceptions, not the norm. Most online supermarkets lack the resources and incentives to systematically review and continuously update tens of thousands of SKUs for vegan status. Because this intervention has high potential impact but is rarely implemented, it’s exactly where an external, mission-driven actor can add the most value. The goal is to make what works in a few frontrunners common everywhere.

Community-powered aggregation (scalable with retailer oversight) To scale beyond pilots, vegan product data can be aggregated from the vegan community through a dedicated reporting platform. To ensure reliability for retail partners, classifications are assigned confidence scores based on user agreement, contributor reliability, and historical accuracy. Only high-confidence data is shared with supermarkets.

 

Couldn't this be automated? Perhaps with occasional human checks? Food products are required to list their ingredients so it should be pretty easy to classify. Or maybe I'm missing something. 

I think it’s different in kind. I sense that I have valenced consciousness and I can report it to others, and I’m the same person feeling and doing the reporting. I infer you, a human, do also, as you are made of the same stuff as me and we both evolved similarly. The same applies to non human animals, although it’s harder to he’s sure about their communication.

But this doesn’t apply to an object built out of different materials, designed to perform, improved through gradient descent etc.

Ok some part of the system we have built to communicate with us ... (read more)

4
Toby Tremlett🔹
Cheekily butting in here to +1 David's point - I don't currently think it's currently reasonable to assume that there is a relationship between the inner workings of an AI system which might lead to valenced experience, and its textual output.  For me I think this is based on the idea that when you ask a question, there isn't a sense in which an LLM 'introspects'. I don't subscribe to the reductive view that LLMs are merely souped up autocorrect, but they do have something in common. An LLM role-plays whatever conversation it finds itself in. They have long been capable of role-playing 'I'm conscious, help' conversations, as well as 'I'm just a tool built by OpenAI' conversations. I can't imagine any evidence coming from LLM self-reports which isn't undermined by this fact.   

Thanks.

I might be obtuse here, but I still have a strong sense that there's a deeper problem being overlooked here. Glancing at your abstract

self-reports from current systems like large language models are spurious for many reasons (e.g. often just reflecting what humans would say)

we propose to train models to answer many kinds of questions about themselves with known answers, while avoiding or limiting training incentives that bias self-reports.

To me the deeper question is "how do we know that the language model we are talking to has access to the ... (read more)

5
Noah Birnbaum
I think one can reasonably ask this question of consciousness/welfare more broadly: how does one have access to their consciousness/welfare?  One idea is that many philosophers think one, by definition, has immediate epistemic access to their conscious experiences (though whether those show up in reports is a different question, which I try to address in the piece). I think there are some phenomenological reasons to think this.  Another idea is that we have at least one instance where one supposedly has access to their conscious experiences (humans), and it seems like this shows up in behavior in various ways. While I agree with you that our uncertainty grows as you get farther from humans (i.e. to digital minds), I still think you're going to get some weight from there.  Finally, I think that, if one takes your point too far (there is no reason to trust that one has epistemic access to their conscious states), then we can't be sure that we are conscious, which I think can be seen as a reductio (at least, to the boldest of these claims).  Though let me know if something I said doesn't make sense/if I'm misinterpreting you. 

The issue of valence — which things does an AI fee get pleasure/pain from and how would we know? — seems to make this fundamentally intractable to me. “Just ask it?” — why would we think the language model we are talking to is telling us about the feelings of the thing having valenced sentience?

See my short form post

https://forum.effectivealtruism.org/posts/fFDM9RNckMC6ndtYZ/david_reinstein-s-shortform?commentId=dKwKuzJuZQfEAtDxP

I still don’t feel I have heard a clear convincing answer to this one. Would love your thoughts.

5
NickLaing
Of course there's lots of problems here (some which you outline well) but I think as AIs get smarter it may well be more accurate than with animals? At least they can tell you something, rather than us drawing long bows interpreting behavioral observations.
5
Noah Birnbaum
I agree this is a super hard problem, but I do think there are somewhat clear steps to be made towards progress (i.e. making self reports more reliable). I am biased, but I did write this piece on a topic that touches on this problem a bit that I think is worth checking out. 

Fair point, some counterpoints (my POV obviously not GiveWell's):

1.  GW could keep the sheets as the source of truth, but maintain a tool that exports to another format for LLM digestion. Alternately, at least commit to maintaining a sheets-based version of each model

2. Spreadsheets are not particularly legible when they get very complicated, especially when the formulas in cells refer to cell numberings (B12^2/C13 etc) rather than labeled ranges. 

3. LLMs make code a lot more legible and accessible these days, and tools like Claude Code  make it easy to create nice displays and interfaces for people to more clearly digest code-based models

1
Brendan Phillips🔸
Thank you both! We are planning to keep spreadsheets as the primary format for our models (for transparency/simplicity reasons like you both noted). However, some way to convert spreadsheets to code for LLM digestion and potentially building web apps or running more complex uncertainty analyses would be valuable to us. Definitely not asking anyone to spend time on this for us! I was just wondering if anyone was aware of a good way to do the conversion.

This feels doable, if challenging. 

I'll try a bit myself and share progress if I make any. Better still, I'll try to signal-boost this and see if others with more engineering chops have suggestions. This seems like something @Sam Nolan and others (@Tanae, @Froolow , @cole_haus )  might be interested in and good at. 

(Tbh my own experience was more in the other direction...  asking Claude Code to generate the Google Sheets from other formats because google sheets are familiar and either to collab with. That was a struggle.)

2
Froolow
One of the big recurring arguments in pharma cost effectiveness modelling is whether to use databases and scripts with a proper statistical programming languages like R, or whether to stick with Excel / Sheets for our models. The advantages of proper languages are manifold, including - as you've pointed out - that you can probably use LLMs on them more successfully to audit and augment your code. However the advantage of spreadsheets is that they are extremely portable, meaning almost anyone can run them natively on their laptop and anyone can understand how they work if they want to change parameters. This matters a lot if transparency is a goal, and transparency is often such an overriding goal that we pick Excel over technically stronger languages. So I'd caution that in converting the existing models to databases / scripts you're actually making implicit decisions about the nature and audience of the model far beyond just whether they are legible to LLMs or not.  I mention this because I'm about to get nerd-sniped by the problem and want to make sure the limitations of the approach are well understood before I get sucked into it!

most likely because it can't effectively work with our multi-spreadsheet models.

My brief experience is that LLMs and Claude code struggle a bit with data in Google sheets in particular. Taking a first step to move the data (and formulas) into databases and scripts might help with this considerably. (And Claude code should be very helpful with that).

1
Brendan Phillips🔸
Thanks, David! I agree that capabilities for spreadsheets are not very strong at the moment. I've tried a few times to get Claude Code to help with converting our CEAs into databases and haven't been very successful as it commonly tries to take shortcuts or the context window runs out. If you (or anyone else) has advice on converting them, I would love to hear it. For context, our most complex CEAs (example) are where we'd get the most value and they're often 1,500+ lines and 10+ tabs, which is where we run into issues.

I agree there's a lot of diversity across non-profit goals and thus no one-size-fits-all advice will be particularly useful.

I suspect the binding constraint here is people on nonprofit boards are often doing it as a very minor part-time thing and while they may be directly aligned with the mission, they find it hard to prioritize this when there's other tasks and deadlines more directly in their face.

And people on non-profit boards generally cannot get paid, so a lot of our standard cultural instincts tell us not to put a high premium on this.

Of course the... (read more)

2
Davidmanheim
Mostly agree. I've been involved in local orgs a bit more than most people in EA, and grew up in a house where my parents were often serving terms on different synagogue and school boards, and my wife has continued her family's similar tradition - so I strongly agree that passionate alignment changes things - but even that rarely leads to boards setting the strategic direction. I think a large part of this is that strategy is hard, as you note, and it's very high context for orgs. I still wonder about who is best placed to track priority drift, and about how much we want boards to own the strategic direction; it would be easy, but I think very unhelpful, for the board to basically just do what Holden suggests, and only be in charge of the CEO - because a lot of value from the board is, or can be, their broader strategic views and different knowledge. And for local orgs, that happens much more, the leaders need to convince board members to do things or make changes, rather than doing it on their own and getting vague approval from the board. But, as a last point, it seems hard to do lots of this for small orgs. Overhead from the board is costly, and I don't know how much effort we want to expect.

Did a decent job for this academic paper, but I think it’s hampered by only having content from Arxiv and various EA/tech forums. Still, it generated some interesting leads.

https://gistcdn.githack.com/daaronr/b9447c40a7a6b948f399073496f98c37/raw/scanner_elasticity_experts.html

Prompt:

... find the most relevant authors and work for Observational price variation in scanner data cannot reproduce experimental price elasticities https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4899765 -- we're looking for methodological experts to evaluate this for The U

... (read more)

Trying this out for using for various Unjournal.org processes (like prioritizing research, finding potential evaluators, linking research to pivotal questions) and projects (assessing LLM vs human research evaluations). Some initial forays (comming from a conversation with Xyra). I still need to human-check it. 

~prompt to Claude code about @Toby_Ord  and How Well Does RL Scale? 

``Toby Ord's writing -- what do the clusters look like? What other research/experts come closest to his post ....  https://forum.effectivealtruism.org/posts/Tysu... (read more)

1
Matrice Jacobine🔸🏳️‍⚧️
This was a linkpost, I didn't write that paper.

At the Unjournal we have a YouTube channel and I'm keen to produce more videos both about our process and the case for our model, and about the content of the research we evaluate and the pivotal questions we consider, which are generally EA-adjacent. This includes explainer videos, making the case videos, interviews/debates, etc. 

But as you and most organizations probably realize, it's challenging and very time-consuming to produce high-quality videos, particularly in terms of creating and synchronizing images, sound, and video editing, etc. Without ... (read more)

Should be fixed now, thanks. Problem was because I started by duplicating the first one and then adjusting the text. But the text only showed changed on my end (NB: @EA Forum Team )

4
Toby Tremlett🔹
Thanks for flagging David - @Will Howard🔹 for vis (Will is building V2 of the poll feature soon)

Thanks -- that's not how it looked on my end, will try to adjust.

2
david_reinstein
Should be fixed now, thanks. Problem was because I started by duplicating the first one and then adjusting the text. But the text only showed changed on my end (NB: @EA Forum Team )

Some notes/takes:

The Effective Giving/EA Marketing project was going fairly strong, making some progress and also some limitations. But I wouldn't take the ~shutdown/pause as strong evidence against this approach. I'd diagnose it as:

1. Some disruption from changes in emphasis/agenda at a few points in the project, driven by the changing priorities in EA at the time, first towards "growing EA rather than fundraising"  (Let's stop saying 'funding overhang', etc.) and then somewhat back in the other direction after the collapse of FTX

2.  I got a gra... (read more)

Here's the Unjournal evaluation package

A version of this work has been published in the International Journal of Forecasting under the title "Subjective-probability forecasts of existential risk: Initial results from a hybrid persuasion-forecasting tournament"
 

We're working to track our impact on evaluated research (see coda.io/d/Unjournal-...) So We asked Claude 4.5 to consider the differences across paper versions, how they related to the Unjournal evaluator suggestions, and whether this was likely to have been causal.

See Claude's report here  ... (read more)

I've found it useful both for posts and for considering research and evaluations of research for Unjournal, with some limitations of course.

- The interface can be a little bit overwhelming as it reports so many different outputs at the same time some overlapping 

+ but I expect it's already pretty usable and I expect this to improve.

+  it's an agent-based approach so as LLM models improve you can swap in the new ones.

I'd love to see some experiments with directly integrating this into the EA forum or LessWrong in some ways, e.g. automatically doin... (read more)

In case helpful, the EA Market Testing team (not active since August 2023) was trying to push some work in htis direction, as well as collaboration and knowledge-sharing between organizations. 

See our knowledge base (gitbook) and data analysis. (Caveat: it is not all SOTA in a marketing sense, and sometimes leaned a bit towards the academic/scientific approach to this). 

Happy to chat more if you're  interested. 

1
Anna Pitner
This is really helpful! thank you for sharing. I wasn’t familiar with this work before, and it looks genuinely very interesting. I’ve bookmarked the knowledge base and will likely come back to it as I continue thinking and writing about marketing in the EA ecosystem. I’d also be very happy to chat and learn more about what the team tried, what seemed promising, and where things got stuck. 

I think "an unsolved problem" could indicate several things. it could be

  1. We have evidence that all of the commonly tried approaches are ineffective, i.e., we have measured all of their effects and they are tightly bounded as being very small

  2. We have a lack of evidence, thus very wide credible intervals over the impact of each of the common approaches.

To me, the distinction is important. Do you agree?

You say above

meaningful reductions either have not been discovered yet or do not have substantial evidence in support

But even "do not have substanti... (read more)

1
geoffrey
For what it's worth, I read that abstract as saying something like, "within the class of interventions studied so far, the literature has yet to settle onto any intervention that can reliably reduce animal product consumption by a meaningful amount, where meaningful amount might be a 1% reduction at Costco scale or long-term 10% reduction at a single cafeteria. The class of interventions being studied tends to be informational and nudge-style interventions like advertising, menu design, and media pamphlets. When effect sizes differ for a given type of intervention, the literature has not offered a convincing reason why a menu-design choice works in one setting versus another." Okay, now that I've typed that up, I can see why "unsolved problem" is unclear.  And I'm probably taking a lot of leaps of faith in interpretation here

Some of this may be a coordination issue. I wanted to proactively schedule more meetings at EAG Connect, but I generally found fewer experienced/senior people at key orgs in the Swapcard relative to the bigger EAGs. And some that were there didn't seem responsive ... as it's free and low-cost, there may also be people that sign up and then don't find the time to commit,

Sorry, I think we meant the same thing. I had a brain freeze. I think my brain got confused by the term “offline."

2
gergo
No worries, trust me, there were plenty of times I had to go back and edit in similar circumstances! :))

Did you get the titles for offline and online reversed for the bullets at the top?

2
gergo
Thanks for checking! I believe no, people tend to book way more 1-1s at offline conferences.

If this post is indeed cutting-edge and prominent, I would be more surprised by the fact that there are not more 'quant' people reporting on this than by the fact that more philosophers are not working on AI x-risk related issues.

Unjournal.org is collaborating with this initiative for our Pivotal Questions projects:

 Is Cultured Meat Commercially Viable? Unjournal’s first proposed ‘Pivotal Question’ (& request for feedback) and 

"How much do plant-based products substitute for animal products and improve welfare?" – An Unjournal Pivotal Question (update: added polls) 

Aiming to integrate this with some of the questions in our community here  

Feedback on these questions and operationalizations is highly appreciated.

I made a similar argument a few years back, advocating that GiveWell should rank, rate, and measure charities beyond the absolute best/most measurable. 

A common response was that the evidence suggested the returns were so heavy-tailed...   So moving money from ~ineffective charities (Make a Wish) to 'near-top' charities, or to mainstream charities operating in similar areas (say MSF vs. AMF) would have far less value than moving money from near-top to top charities. 

My counter-response was ... ~we don't have solid that charities like MSF are... (read more)

As others note, the East Bay/Berkeley more or less hits the spot if you don't care about affordability.

(Would be nice if there were an affordable alternative though.)

It's hard for me to glean what the consensus is in this thread/on this issue. But if there seems to be a strong case that some outside scrutiny is needed, this might be something The Unjournal (Unjournal.org) could help with. Bringing "outside the EA bubble" academic expertise to weigh in is one of our key things

We generally focus on economics and social science but we might be able to stretch to this. (Feel free to dm/suggest/ping me).

4
NickLaing
hey @david_reinstein appreciate that! Perhaps though given the apparent strong disagreement to my concerns about conflicts of interest / the same people managing a bunch of intertwined orgs I'm not sure many other think there's a big issue here.

I like the post and agree with most of it, but I don't understand this point. Can you clarify? To me it seems like the opposite of this.

If EA organizations are seen promoting frugality, their actions could be perceived as an example of the rich promoting their own interests over those of the poor. This would increase the view that EA is an elitist movement.

4
James Brobin
I explain what I meant in this comment. I'll update the post to be more clear.

A quick ~testimonial. Abraham's advice was very helpful to us at Unjournal.org. As our fiscal sponsor was ending its operations we needed to transition quickly. We were able to get a 501(c)3 with not a tremendous amount of effort much quicker than anticipated. 

 in retrospect, there would have been a better decision to form a 501c3 as soon as we had our first grant and had applied for a larger grant. It would have saved us a substantial amount of fees and allowed us to earn interest/investment income on the larger grant. And it's also easier to access tech discounts as a 501(c)(3) rather than a fiscally sponsored organization. 
 

Enjoyed it, a good start.

I like the stylized illustrations but I think a bit more realism (or at least detail) could be helpful. Some of the activities and pain suffered by the chickens was hard to see.

The transition to the factory farm/caged chickens environment was dramatic and the impact I think you were seeking.

One fact-based question which I don't have the answer to -- does this really depict the conditions for chickens where the eggs are labeled as "pasture raised?" I hope so, but I vaguely heard that that was not a rigorously enforced label.

Here's some suggestions from 6 minutes of ChatGPT thinking. (Not all are relevant, e.g., I don't think "Probable Causation" is a good fit here.)

Do you see other podcasts filling the long-form, serious/in-depth, EA-adjacent/aligned niche in areas other than AI? E.g., GiveWell has a podcast, but I'm not sure it's the same sort of thing. There's also Hear This Idea, often Clearer Thinking or  Dwarkesh Patel cover relevant stuff. 

(Aside, was thinking of potentially trying to do a podcast involving researchers and research evaluators linked to The Unjournal; if I thought it could fill a gap and we could do it well, which I'm not sure of.) 

No, I really don't. Sometimes you see things in the same territory on Dwarkesh (which is very AI-focused) or Econtalk (which is shorter and less and less interesting to me lately). Rationally Speaking was wonderful but appears to be done. Hear This Idea is intermittent and often more narrowly focused. You get similar guests on podcasts like Jolly Swagman but the discussion is often at too low of a level, with worse questions asked. I have little hope of finding episodes like those with Hannah Ritchie, Christopher Brown, Andy Weber, or Glen Weyl anywhere el... (read more)

2
david_reinstein
Here's some suggestions from 6 minutes of ChatGPT thinking. (Not all are relevant, e.g., I don't think "Probable Causation" is a good fit here.)

This seems a bit related to the “Pivotal questions”: an Unjournal trial initiative   -- we've engaged with a small group of organizations and elicited some of these -- see here.

To highlight some that seem potentially relevant to your ask:

What are the effects of increasing the availability of animal-free foods on animal product consumption? Are alternatives to animal products actually used to replace animal products, and especially those that involve the most suffering? Which plant-based offerings are being used as substitutes versus complements

... (read more)

Thanks for the thoughts. Note that I'm trying to engage/report here because we're working hard to make our evaluations visible and impactful, and this forum seems like one of the most promising interested audiences. But also eager to hear about other opportunities to promote and get engagement with this evaluation work, particularly in non-EA academic and policy circles.

I generally aim to just summarize and synthesize what the evaluators had written and the authors' response, bringing in what seemed like some specific relevant examples, and using quotes or... (read more)

A final reflective note: David, I want to encourage you to think about the optics/politics of this exchange from the point of view of prospective Unjornal participants/authors. 

I appreciate the feedback. I'm definitely aware that we want to make this attractive to authors and others, both to submit their work and to engage with our evaluations. Note that in addition to asking for author submissions, our team nominates and prioritizes high-profile and potential-high-impact work, and contact authors to get their updates, suggestions, and (later) respons... (read more)

1
geoffrey
Yes. But zooming back out, I don't know if these EA Forum posts are necessary. A practice I saw i4replication (or some other replication lab) is that the editors didn't provide any "value-added" commentary on any given paper. At least, I didn't see these in any tweets they did. They link to the evaluation reports + a response from the author and then leave it at that. Once in a while, there will be a retrospective on how the replications are going as a whole. But I think they refrain from commenting on any paper. If I had to rationalize why they did that, my guess is that replications are already an opt-in thing with lots of downside. And psychologically, editor commentary has a lot more potential for unpleasantness. Peer review tends to be anonymous so it doesn't feel as personal because the critics are kept secret. But editor commentary isn't secret...actually feels personal, and editors tend to have more clout. Basically, I think the bar for an editor commentary post like this should be even higher than the usual process. And the usual evaluation process already allows for author review and response. So I think a "value-added" post like this should pass a higher bar of diplomacy and insight.

I meant "constructive and actionable" In that he explained why the practices used in the paper had potentially important limitations (see here on "assigning an effect size of .01 for n.s. results where effects are incalculable")...

And suggested a practical response including a specific statistical package which could be applied to the existing data:

"An option to mitigate this is through multiple imputation, which can be done through the metansue (i.e., meta-analysis of non-significant and unreported effects) package"

In terms of the cost-benefit test it dep... (read more)

3
Seth Ariel Green 🔸
David, there are two separate questions here, which is whether these analyses should be done or whether I should have done them in response to the evaluations.  If you think these analyses are worth doing, by all means, go ahead!

Thanks for the detailed feedback, this seems mostly reasonable. I'll take a look again at some of the framings, and try to adjust. (Below and hopefully later in more detail).

the phrase "this meta-analysis is not rigorous enough". it seems this meta-analysis is par for the course in terms of quality.

This was my take on how to succinctly depict the evaluators' reports (not my own take), in a way the casual reader would be able to digest. Maybe this was rounding down too much, but not by a lot, I think. Some quotes from Janés evaluation that I think are r... (read more)

4
Seth Ariel Green 🔸
Hi David, To paraphrase Diddy's character in Get Him to the Greek, "What are you talking about, the name of the [paper] is called "[Meaningfully reducing consumption of meat and animal products is an unsolved problem]!" (😃)  That is our central claim. We're not saying nothing works; we're saying that meaningful reductions either have not been discovered yet or do not have substantial evidence in support. That's author, singular. I said at the top of my initial response that I speak only for myself. 

I'll frame it explicitly here: when we did one check and not another, or one one search protocol and not another, the reason, every single time, is opportunity costs. When I write: "we thought it made more sense to focus on the risks of bias that seemed most specific to this literature," notice the word 'focus', which means saying no. 

That is clearly the case, and I accept there are tradeoffs. But ideally I would have liked to see a more direct response to the substance of the points made by the evaluators. But I understand that there are tradeoffs th... (read more)

7
geoffrey
Chiming in here with my outsider impressions on how fair the process seems @david_reinstein If I were to rank the evaluator reports, evaluation summary, and the EA Forum post in which ones seemed the most fair, I would have ranked the Forum post last. It wasn't until I clicked through to the evaluation reports that I felt the process wasn't so cutting. Let me focus on one very specific framing in the Forum post, since it feels representative. One heading includes the phrase "this meta-analysis is not rigorous enough". This has a few connotations that you probably didn't mean. One, this meta-analysis is much worse than others. Two, the claims are questionable. Three, there's a universally correct level of quality that meta-analyses should reach and anything that falls short of that is inadmissible as evidence.  In reality, it seems this meta-analysis is par for the course in terms of quality. And it was probably more difficult to do so given the heterogeneity in the literature. And the central claim of the meta-analysis doesn't seem like something either evaluator disputed (though one evaluator was hesitant).  Again, I know that's not what you meant and there are many caveats throughout the post. But it's one of a few editorial choices that make the Forum post seem much more critical than the evaluation reports, which is a bit unusual since the Evaluators are the ones who are actually critiquing the paper. Finally, one piece of context that felt odd not to mention was the fundamental difficulty of finding an expert in both food consumption and meta-analysis. That limits the ability of any reviewer to make a fair evaluation. This is acknowledged at the bottom of the Evaluation Summary. Elsewhere, I'm not sure where it's said. Without that mentioned, I think it's easy for a casual reader to leave thinking the two Evaluators are the "most correct".

This does indeed look interesting, and promising. Some quick (maybe naive) thoughts on that particular example, at a skim.

  • An adaptive/reinforcement learning design could make a mega study like this cheaper ... You end up putting more resources into the arms that start to become more valuable/where more uncertainty needs to be resolved.  
  • I didn't see initially how they corrected did things like multiple hypothesis correction, although I'd prefer something like a Bayesian approach, perhaps with multiple levels of the model... effect category, specific i
... (read more)
1
Seth Ariel Green 🔸
For what it's worth, I thought David's characterization of the evaluations was totally fair, even a bit toned down. E.g. this is the headline finding of one of them: David characterizes these as "constructive and actionable insights and suggestions". I would say they are tantamount to asking for a new paper, especially the excluding of small studies, which was core to our design and would require a whole new search, which would take months. To me, it was obvious  that I was not going to do that (the paper had already been accepted for publication at that point). The remaining suggestions also implied dozens ( hundreds?) of hours of work. Spending weeks satisfying two critics didn't pass a cost-benefit test.[1] It wasn't a close call. 1. ^  really need to follow my own advice now and go actually do other projects 😃

Post roasted here on roastmypost (Epistemic Audit)

It gets a B-  which seems to be the modal rating. 

Some interesting comments (going in far more detail than the summary below)

 

This EA Forum post announces Joe Carlsmith's career transition from Open Philanthropy to Anthropic while providing extensive justification for working at a frontier AI company despite serious safety concerns. The document demonstrates exceptional epistemic transparency by explicitly acknowledging double-digit extinction probabilities while defending the decision on con

... (read more)

Still loving this, hosted a good set of short and LT EA stays. But the 'add review' function is still not working. That would add a lot of value to this. Can someone look into it?

https://coda.io/d/EA-Houses_dePaxf_RJiq/Add-review_su_RF7Tc#Reviews_tuLcDk6R/r8

 

I ran this through QURI's RoastMyPost.org, and it gave a mixed but fairly positive assessment (something like 68/100).
Full assessment here (multiple agents/tools). 

The epistemic checker and the fact checker seem particularly useful.

The main limitations seem to be:

  1. Strong claims and major recommendations without corresponding evidence and support
  2. Vagueness/imprecise definition (this is a partially my own take, partially echoed by RoastMyPost – e.g., it's hard for me to grok what these new cause areas are, some are very much shorthand.)
     

    I meant to po

... (read more)

There's recently been increased emphasis on "principles-first" EA, which I think is great. But I worry that in practice a "principles-first" framing can become a cover for anchoring on existing cause areas, rather than an invitation to figure out what other cause areas we should be working o

I don't quite see the link here. Why would principals first be a cover for anchoring on existing cause areas? Is there a prominent  example of this?

Load more