MR

Mario Reutter

2 karmaJoined Sep 2022

Comments
1

Unbalanced samples are not a problem per se. You can run into a problem of representation/generalization for the smaller sample but this argument is independent of balancing and only has to do with small sample sizes.

@david_reinstein made an excellent point about heteroscedasticity / variance. To factor this into your original post: You want to optimize the cost-effectiveness of the precision of your group-level difference score. This is achieved by minimizing the standard errors (SE) of the group-level estimates of each sample, which are just the standard deviations (SD) divided by the square root of the respective observations. So your term would expand to: 
Control-to-treat-ratio = sqrt(treatment_cost/control_cost) * control_SD/treatment_SD.
The problem, in practice, is that you usually know the costs a priori but not the SDs. If variances are not equal, however, I would agree with @david_reinstein that the treatment group will more likely show greater variance on your outcome variable (if control group has more variance, I would rather reconsider the choice of the outcome variable).

If you want to read more about the concept of precision and its relation to statistical power (also cf. the paper that @Karthik Tadepalli cited), we just put together a preprint here that is supposed to double as a teaching ressource: https://doi.org/10.31234/osf.io/m8c4k (introduction and discussion will suffice since the middle part focusses on biological/neuroscientific measurements that have vastly different properties than, e.g., questionnaire data).
Here is the glossary that is mentioned in the  paper: https://osf.io/2wjc4
And here is the associated Twitter post with some digest about the most important insights: https://twitter.com/bioDGPs_DGPA/status/1616014732254756865