Seth Ariel Green 🔸

Research Scientist @ Humane and Sustainable Food Lab
1515 karmaJoined Working (6-15 years)New York, NY, USA
setharielgreen.com

Bio

Participation
1

I am a Research Scientist at the Humane and Sustainable Food Lab at  Stanford.

How others can help me

the lab I work at is seeking collaborators! More here.

How I can help others

If you want to write a meta-analysis, I'm happy to consult! I think I know something about what kinds of questions are good candidates, what your default assumptions should be, and how to delineate categories for comparisons

Comments
178

Topic contributions
1

It's an interesting question. 

From the POV of our core contention --  that we don't currently have a validated, reliable intervention to deploy at scale -- whether this is because of absence of evidence (AoE) or evidence of absence (EoA) is hard to say. I don't have an overall answer, and ultimately both roads lead to "unsolved problem." 

We can cite good arguments for EoA (these studies are stronger than the norm in the field but show weaker effects, and that relationship should be troubling for advocates) or AoE (we're not talking about very many studies at all), and ultimately I think the line between the two is in the eye of the beholder.

Going approach by approach, my personal answers are 

  1. choice architecture  is probably AoE, it might work better than expected but we just don't learn very much from 2 studies (I am working on something about this separately)
  2. the animal welfare appeals are more AoE, esp. those from animal advocacy orgs
  3. social psych approaches, I'm skeptical of but there weren't a lot of high-quality papers so I'm not so sure (see here for a subsequent meta-analysis of dynamic norms approaches).
  4. I would recommend health for older folks, environmental appeals for Gen Z. So there I'd say we have evidence of efficacy, but to expect effects to be on the order of a few percentage points.

Were I discussing this specifically with a funder, I would say, if you're going to do one of the meta-analyzed approaches -- psych, nudge, environment, health, or animal welfare, or some hybrid thereof -- you should expect small effect sizes unless you have some strong reason to believe that your intervention is meaningfully better than the category average. For instance, animal welfare appeals might not work in general, but maybe watching Dominion is unusually effective. However, as we say in our paper, there are a lot of cool ideas that haven't been tested rigorously yet, and from the point of view of knowledge, I'd like to see those get funded first.

Hi David,

To be honest I'm having trouble pinning down what the central claim of the meta-analysis is.

To paraphrase Diddy's character in Get Him to the Greek, "What are you talking about, the name of the [paper] is called "[Meaningfully reducing consumption of meat and animal products is an unsolved problem]!" (😃)  That is our central claim. We're not saying nothing works; we're saying that meaningful reductions either have not been discovered yet or do not have substantial evidence in support.

However the authors hedge this in places

That's author, singular. I said at the top of my initial response that I speak only for myself. 

When pushed, I say I am "approximately vegan" or "mostly vegan," which is just typically "vegan" for short, and most people don't push. If a vegan gives me a hard time about the particulars, which essentially never happens, I stop talking to them 😃

IMHO we would benefit from a clear label for folks who aren't quite vegan but who only seek out high-welfare animal products; I think pasturism/pasturist is a possible candidate.

Love talking nitty gritty of meta-analysis 😃 

  1. IMHO, the "math hard" parts of meta-analysis are figuring out what questions you want to ask, what are sensible inclusion criteria, and what statistical models are appropriate. Asking how much time this takes is the same as asking, where do ideas come from?
  2. The "bodybuilding hard" part of meta-analysis is finding literature. The evaluators didn't care for our search strategy, which you could charitably call "bespoke" and uncharitably call "ad hoc and fundamentally unreplicable." But either way, I read about 1000 papers closely enough to see if they qualified for inclusion, and then, partly to make sure I didn't duplicate my own efforts, I recorded notes on every study that looked appropriate but wasn't. I also read, or at least read the bibliographies of, about 160 previous reviews. Maybe you're a faster reader than I am, but ballpark, this was 500+ hours of work.
  3. Regarding the computational aspects, the git history tells the story, but specifically making everything computationally reproducible, e.g. writing the functions, checking my own work, setting things up to be generalizable -- a week of work in total? I'm not sure.
  4. The paper went through many internal revisions and changed shape a lot from its initial draft when we pivoted in how we treated red and processed meat.  That's hundreds of hours. Peer review was probably another 40 hour workweek.
  5. As I reread reviewer 2's comments today, it occurred to me that some of their ideas might be interesting test cases for what Claude Code is and is not capable of doing. I'm thinking particularly of trying to formally incorporate my subjective notes about uncertainty (e.g. the many places where I admit that the effect size estimates involved a lot of guesswork) into some kind of...supplementary regression term about how much weight an estimate should get in meta-analysis? Like maybe I'd use Wasserstein-2 distance, as my advisor Don recently proposed? Or Bayesian meta-analysis? This is an important problem, and I don't consider it solved by RoB2 or whatever, which means that fixing it might be, IDK, a whole new paper which takes however long that does? As my co-authors Don and Betsy & co. comment in a separate paper on which I was an RA:
    > Too often, research syntheses focus solely on estimating effect sizes, regardless of whether the treatments are realistic, the outcomes are assessed unobtrusively, and the key features of the experiment are presented in a transparent manner. Here we focus on what we term landmark studies, which are studies that are exceptionally well-designed and executed (regardless of what they discover). These studies provide a glimpse of what a meta-analysis would reveal if we could weight studies by quality as well as quantity. [the point being, meta-analysis is not well-suited for weighing by quality.]
  6. It's possible that some of the proposed changes would take less time than that. Maybe risk of bias assessment could be knocked out in a week?. But it's been about a year since the relevant studies were in my working memory, which means I'd probably have to re-read them all, and across our main and supplementary dataset, that's dozens of papers. How long does it take you to read dozens of papers? I'd say I can read about 3-4 papers a day closely if I'm really, really cranking. So in all likelihood, yes, weeks of work, and that's weeks where I wouldn't be working on a project about building empathy for chickens. Which admittedly I'm procrastinating on by writing this 500+ word comment 😃 

David, there are two separate questions here, which is whether these analyses should be done or whether I should have done them in response to the evaluations.  If you think these analyses are worth doing, by all means, go ahead!

A final reflective note: David, I want to encourage you to think about the optics/politics of this exchange from the point of view of prospective Unjornal participants/authors. There are no incentives to participate. I did it because I thought it would be fun and I was wondering if anyone would have ideas or extensions that improved the paper. Instead, I got some rather harsh criticisms implying we should have written a totally different paper. Then I got this essay, which was unexpected/unannounced and used, again, rather harsh language to which I objected. Do you think this exchange looks like an appealing experience to others? I'd say the answer is probably not. 

A potential alternative: I took a grad school seminar where we replicated and extended other people's papers. Typically the assignment was to do the robustness checks in R or whatever, and then the author would come in and we'd discuss. It was a great setup. It worked because the grad students actually did the work, which provided an incentive to participate for authors. The co-teachers also pre-selected papers that they thought were reasonably high-quality, and I bet that if they got a student response like Matthew's, they would have counseled them to be much more conciliatory,  to remember that participation is voluntary, to think through the risks of making enemies (as I counseled in my original response), etc. I wonder if something like that would work here too. Like, the expectation is that reviewers will computationally reproduce the paper, conduct extensions and robustness checks, ask questions if they have them, work collaboratively with authors, and then publish a review summarizing the exchange. That would be enticing! Instead what I got here was like a second set of peer reviewers, and unusually harsh ones at that, and nobody likes peer review.

It might be the case that meta-analyses aren't good candidates for this kind of work, because the extensions/robustness checks would probably also have taken Matthew and the other responder weeks, e.g. a fine end of semester project for class credit but not a very enticing hobby.

Just a thought.  

For what it's worth, I thought David's characterization of the evaluations was totally fair, even a bit toned down. E.g. this is the headline finding of one of them:

major methodological issues undermine the study's validity. These include improper missing data handling, unnecessary exclusion of small studies, extensive guessing in effect size coding, lacking a serious risk-of-bias assessment, and excluding all-but-one outcome per study.

David characterizes these as "constructive and actionable insights and suggestions". I would say they are tantamount to asking for a new paper, especially the excluding of small studies, which was core to our design and would require a whole new search, which would take months. To me, it was obvious  that I was not going to do that (the paper had already been accepted for publication at that point). The remaining suggestions also implied dozens ( hundreds?) of hours of work. Spending weeks satisfying two critics didn't pass a cost-benefit test.[1] It wasn't a close call.

  1. ^

     really need to follow my own advice now and go actually do other projects 😃

@geoffrey We'd love to run a megastudy! My lab put in a grant proposal with collaborators at a different Stanford lab to do just that but we ultimately went a different direction. Today, however, I generally believe that we don't even know what is the right question to be asking -- though if I had to choose one it would be, what ballot intiative does the most for animal welfare while also getting the highest levels of public support, e.g. is there some other low-hanging fruit equivalent to "cage free" like "no mutilation" that would be equally popular. But in general I think we're back to the drawing board in terms of figuring out what is the study we want to run and getting a version of it off the ground, before we start thinking about scaling up to tens of thousands of people. 

@david_reinstein, I suppose any press is good press so I should be happy that you are continuing to mull on the lessons of our paper 😃 but I am disappointed to see that the core point of my responses is not getting through. I'll frame it explicitly here: when we did one check and not another, or one one search protocol and not another, the reason, every single time, is opportunity costs.  When I say "we thought it made more sense to focus on the risks of bias that seemed most specific to this literature," I am using the word 'focus' deliberately, in the sense of "focus means saying no," i.e. 'we are always triaging.' At every juncture, navigating the explore/exploit dilemma requires judgment calls. You don't have to like that I said no to you, but it's not a false dichotomy, and I do not care for that characterization.

To the second question of whether anyone will do the kind of extension work, I personally see this as a great exercise for grad students. I did all kinds of replication and extension work in grad school. A deep dive into a subset of contact hypothesis literature I did in a political psychology class in 2014, which started with a replication attempt, eventually morphed into The Contact Hypothesis Re-evaluated. If you, a grad student. want to do this kind of project, please be in touch, I'd love to hear from you. (I'd recommend starting by downloading the repo and asking claude code about robustness checks that do and do not require gathering additional data).

That's interesting, but not what I'm suggesting. I'm suggesting something that would, e.g., explain why you tell people to "ignore the signs of my estimates for the total welfare" when you share posts with them. That is a particular style and it says something about whether one should take your work in a literal spirit or not, which falls under the meta category of why you write the way you write; and to my earlier point, you're sharing this suggestion here with me in a comment rather than in the post itself 😃 Finally, the fact that there's a lot of uncertainty about whether wild animals have positive or negative lives is exactly the point I raised about why I have trouble engaging with your work. The meta post I am suggesting, by contrast, motivate and justify this style of reasoning as a whole, rather than providing a particular example of it. The post you've shared is a link in a broader chain. I'm suggesting you zoom out and explain what you like about this chain and why you're building it.

By all means, show us the way by doing it better 😃 I'd be happy to read more about where you are coming from, I think your work is interesting and if you are right, it has huge implications for all of us. 

Load more