Hide table of contents

The Institute for Replication is an attempt to massively replicate social-science papers to see which results actually hold up under scrutiny. In principle, replication is a very resource‑efficient way to gather additional evidence: you can replicate a high‑profile finding in a tiny fraction of the time it took to produce the original paper.

I’m personally totally on board with this. I love the idea. It’s also the kind of thing that naturally attracts philanthropic funding. As far as I can tell, the Institute for Replication has received grants from Coefficient Giving and, possibly other philanthropic funders in the past.

I think structured programs that generate more replications are a fruitful way forward—but the Replication Games didn’t work out for me, and could perhaps be massively improved upon. I would’ve been more productive just replicating on my own at home.

My best guess is that this outcome wasn’t inevitable; it was mostly driven by organizational issues (especially around team formation and follow‑through) that might be fixable.

How the Games work

Replication Games are essentially replication hackathons. You apply to participate, and organizers place participants into small teams based on practical constraints—things like methods experience, preferred software (R/Stata/Python), and the kind of empirical work people feel comfortable doing. Teams are typically assigned a paper from a curated list where replication seems feasible (e.g., the paper is influential and the underlying data/code are available). Sometimes participants suggest papers themselves, but the event usually works best when papers are screened in advance for tractability.

Before the in‑person event, teams do prep work: reading the paper, checking the available materials, setting up the computing environment, and (ideally) sketching out a plan (which robustness checks could be done, how to divide tasks). Then the team meets in person for an intensive work session—often a full day—focused on getting as far as possible and—not least—having fun: reproducing the headline results, documenting if code breaks, and running sensible robustness checks or alternative experiments where appropriate.

Because it’s rarely possible to finish everything during the event itself, there’s usually a post‑game phase where the team cleans up code, writes up a standardized replication report (what was attempted, what worked, what didn’t, why), and submits it to the organizers. The Institute for Replication coordinates the overall process, and there’s typically a local organizer (not from the Institute, but the local university/institution) handling the venue, food, and the practicalities.

Why the Replication Games did not work out (for me)

In my experience, the work achieved wasn’t worth the effort that was put in. If I could go back, I would choose a different approach—and I think that’s mostly due to organizational hiccups that could potentially be eliminated.

The main issue at the Replication Games seems to be group dynamics. Everyone knows the free‑rider problem from group projects; the Replication Games aren’t unique here. But the format aggravates the problem: What usually deters free‑riding is continued dependence on each other—repeated interaction, reputational consequences, and clear accountability. The Games often have the opposite structure: you’re thrown together with a bunch of strangers (great for meeting people; poor for incentives) who you might never work with again, for what is essentially unpaid work, with benefits (if any) shared across many. Even motivated people—me included—can drift into “I’ll do my part, but I’m not going to carry this whole thing” mode.

This is a selective, highly personal take. That said, conversations with other replicators (including people from other groups) suggested my feelings weren’t rare. At dinner, people from other groups reported similar issues; someone even said, “I’m really not looking forward to finishing it.”

My experience

I went into the Games hoping for networking, learning, results, and fun. And to be clear: the overall concept is well designed to deliver those things, and communication about what to expect was generally clear.

First game: I was placed in a group that intended to work with a proprietary software package I had zero experience with and zero intention of learning (SPSS). On top of that, as the game approached, my teammates weren’t moving toward a decision on what paper to replicate or how we’d do it. It became clear quickly that this wouldn’t work for me, and my best option was to pull out last‑minute and not attend. I incurred real costs, and I didn’t replicate anything. This also made me wonder whether the matching problem—replicators to groups, groups to papers—is so complex that it might benefit from giving replicators more choice up front.

Second game: Of my four team members, one showed up extremely well prepared; one showed up completely unprepared; one was added on short notice (so couldn’t prepare); and one didn’t show up. Still, we got to work. I was in a genuine state of flow: the room was vibrant with scientific bliss, I met new people, and I felt like I was contributing to an awesome scientific endeavor. I worked with full concentration for half a day. Lunch and the local organization were awesome!

Then the flow got punctured: an Institute member came over and told us there’d been a mistake during the paper assignment. Another group—sitting a few metres away from us—had accidentally been assigned the same paper. They’d noticed the error a couple of hours earlier, but hoped we’d take different approaches so the outputs could be merged into one larger replication. Unfortunately, both groups tweaked the experiment in exactly the same way, so a lot of what we’d done was suddenly redundant. 

Because my experience with the research management of Replication Games has been so negative, I just do not trust the process anymore and likely won't attend again although I would really enjoy the work.

Tentative suggestions

If the key problem is group dynamics, then fixing it means changing incentives and accountability—not just tweaking logistics. One approach could be to create stronger “repeated interaction” or future‑opportunity effects. For example: an enticing, perhaps prestigious, replication event that’s only open to teams who reliably completed high‑quality replications. If your replication is poor (as judged by reviewers) or simply unfinished, you don’t get invited. That said, this needs care: you don’t want to create incentives to distort replication results: replicators might try to “break” a high‑profile paper by doing something unnecessarily adversarial in the analysis rather than aiming for a fair test.

But—unless I’ve misunderstood—Replication Games are explicitly meant not to recur in the same places. That makes it unlikely you’ll run into the same people again, which worsens the limited‑repeated‑interaction problem. For me, getting together with others who are passionate about replication was a key motivation to attend. Of course, I could replicate alone, or with people I already know—but then the question becomes: why attend a replication game at all? At that point, I might as well just start on my own.

Our team noted that the replication games included minimal follow-up, not even a "thank you for participating, here are the next steps" email. I don’t know whether the Institute for Replication systematically gathers feedback, or even encourages submission of reports from all participants, but I did not receive anything; that seems like low‑hanging fruit. 

I guess that replicating empirical science papers is hard, and it’s a learning process for everyone involved. Happy replicating!

4

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities