[July 30 update]: We have an answer regarding sample size if intra-cluster correlation coefficient is assumed zero. This sample size calculator can be used.


EA Cameroon needs statistical impact evaluation sample size and cluster inclusion advice for their COVID-19 project. The project should ideally start toward the end of the week.

Data should be gathered before and after the main part of the project (after one month).

The idea is to count the number of persons out of a certain number who wear face covering and how long this counting took. This information can be used as a proxy for preventive measures and social distancing.

I would like to ask about the sample size and inclusion of clusters. There are 180 000 persons in the campaign area and 6 villages/parts. Volunteers would prefer not to travel to all 6 campaign, but more so an equal number of non-campaign, villages, as the non-intervention communities are distant.

Different languages are spoken in the 6 parts, but the campaigning will include all of these languages. Otherwise, the parts are similar. Since little information is currently broadcast, the campaign may increase the share of persons wearing a face covering from 50% to at least 60% (or equivalent percentage (20%) increase from another baseline). Can only e. g. 3+3 villages be included? 6 intervention + 3 non-intervention? How important, in terms of statistical power is to include all clusters and an equal number of non-intervention cluster? How many persons should be observed at each place?

I will appreciate any replies.

New Answer
Ask Related Question
New Comment

2 Answers

Hey, thank you for the work you are doing! Here are my thoughts (I'm an economist at IDinsight and work on this type of research):

  • If you want to understand the impact of your program, I don't recommend doing an RCT at this stage. This seems like a very small pilot and you won't have enough power / sample size to detect an effect (more see below). You should only consider running an RCT if and when you plan to scale this up later to a sufficient scale.
  • Instead what I advise is trying to understand and improve your impact by doing some small sample survey + qualitative research. E.g. when you go to a village, talk to locals (ideally capture a good representation of different types of people in the community, not just leaders but also relatively marginalized groups; you could do a rigorous sampling but I'm not sure if that's realistic or worthwhile at this stage given the trouble that involves) to understand their current knowledge, attitudes, and behavior around COVID (what knowledge they lack, what attitude needs changed, what rumors are around etc.) -- to better design your messages; also ask them what kind of information campaign would engage them, and after you do your program ask how they felt -- whether they liked it, whether they found it useful, what they learned, what they'd do differently etc. Can also contact them some time later to see if they observe any behavioral change among people in the community (better than asking what they themselves do due to social desirability bias).

More technical details:

Since you're doing a clustered RCT -- treatment is at the village level and the outcomes of people within a village are likely positively correlated -- you'll need a larger sample size than if you were doing an individual-level RCT (for the math, see section 4.2 of this -- generally a great resource for RCT design). You can do a power calculation for a clustered randomized controlled trial, e.g. using Stata's "power twomeans" command. One parameter that's missing is the intraclass correlation (correlation among individuals within a treatment unit). However, since your cluster size is SO small (3 and 3), when I try to do this calculation in Stata with any reasonable assumption Stata says you cannot have enough power (assuming you want all the standard -- 80% power, 5% significance level etc.). That's why I recommend not doing an RCT unless you have a program at scale

If you don't already have it, I would strongly recommend getting a copy of Gerber & Green's Field Experiments. I would also very strongly recommend that you (or EA Cameroon) engage an experimental methodology expert for this project, rather than pose the question on the forum (I am not such an expert).

It is very difficult to address all of these questions in a broad way, since the answers depend on:

  • The smallest effect size you would hope to observe
  • Your available resources
  • The population within each cluster
  • The total population
  • Your analysis methodology

I'm a little confused about the setup. You say that there are 6 groups— so how would it be possible to have "6 intervention + 3 non-intervention?" Sorry if I'm misunderstanding.

In general, and particularly in this context, it makes sense to split your clusters evenly between treatment and control. This is the setup that minimizes the standard error of the difference between groups. When the variance is larger, smaller effect sizes are difficult to detect. The smaller the number of clusters in your control group, for example, the larger the effect size that you would have to detect in order to make a statistically defensible claim.

With such a small number of clusters, effect sizes would have to be very large in order to be statistically distinguishable from zero. If indeed 50% of the population in these groups is already masked, 6 clusters may not be enough to see an effect.

Can we get some clarification on some of your questions? Particularly:

How important, in terms of statistical power is to include all clusters

If you have only 6 to choose from, then the answer is very important. But I'm not sure this is the sense in which you mean this.

How many persons should be observed at each place?

My inclination here is to say "as many as possible." But this is constrained by your resources and your method of observation. Can you say more about the data collection plan?