[July 30 update]: We have an answer regarding sample size if intra-cluster correlation coefficient is assumed zero. This sample size calculator can be used.
EA Cameroon needs statistical impact evaluation sample size and cluster inclusion advice for their COVID-19 project. The project should ideally start toward the end of the week.
Data should be gathered before and after the main part of the project (after one month).
The idea is to count the number of persons out of a certain number who wear face covering and how long this counting took. This information can be used as a proxy for preventive measures and social distancing.
I would like to ask about the sample size and inclusion of clusters. There are 180 000 persons in the campaign area and 6 villages/parts. Volunteers would prefer not to travel to all 6 campaign, but more so an equal number of non-campaign, villages, as the non-intervention communities are distant.
Different languages are spoken in the 6 parts, but the campaigning will include all of these languages. Otherwise, the parts are similar. Since little information is currently broadcast, the campaign may increase the share of persons wearing a face covering from 50% to at least 60% (or equivalent percentage (20%) increase from another baseline). Can only e. g. 3+3 villages be included? 6 intervention + 3 non-intervention? How important, in terms of statistical power is to include all clusters and an equal number of non-intervention cluster? How many persons should be observed at each place?
I will appreciate any replies.
Thank you. I was not able to get (a pdf of) Field Experiments, but downloaded the "Field Experimental Designs for the Study of Media Effects," also co-authored by Green. They point out "robust cluster standard errors" to estimate "individual-level average treatment effect" (172).
To answer your points:
I meant 6 groups in the intervention area, and some number of groups (e. g. 3 or 6) in the non-intervention area.
OK. So 3 intervention clusters and 3 non-intervention clusters are better than 6 intervention clusters and 3 non-intervention clusters but 6+6 may be necessary? Would the answer depend on the intra-cluster correlation coefficient (ρ)? Perhaps, the texts that generally talk about clustering assume relatively significant between cluster variability and low within cluster variability (so high ρ). However, in this study, how people respond to the messaging may not depend much on their 'cluster assignment,' but much more on their individual characteristics that, on average, may be comparable across the clusters and the studied population.
I should ask EA Cameroon about the possibility of different average responses in different villages.
Do you know of any online sample size calculator that includes clusters?