Hide table of contents

Abstract: 

Across the world, millions of people suffer from mental health conditions, yet our understanding of these issues is built on unequal data. The digital traces used to study behaviour are heavily skewed toward WEIRD populations: Western, Educated, Industrialized, Rich, and Democratic. Now imagine reading a scientific paper about a condition your sibling is struggling with. Your sibling is from the Global South, has limited formal education, lives in poverty, and comes from a region rarely visible in academic research. They are, in every sense, the opposite of WEIRD and when you look at the sample, someone like them simply does not exist in it. Once might be an oversight. Twice, a limitation but when most studies in most journals reflect the same narrow slice of humanity, you have to ask: does this science actually apply to me?

This is not a new problem. Representation gaps exist everywhere (e.g. education, politics, media). However, in healthcare the stakes are different. When research does not reflect the full diversity of human experience, treatments may not work as well, diagnoses may be missed, and entire communities are left navigating their health with guidance that was never designed for them. This matters beyond fairness. It is not about representation for its own sake but about scientific validity. Skewed data produces skewed conclusions, and skewed conclusions in mental health cause real harm (Hargrove et al., 2021). These biases persist in part because unrepresentative samples are routinely accepted on the basis of practical constraints such as research timelines, funding, or recruitment difficulty rather than scientific adequacy.

This article does not propose building a new system from scratch but developing a Global Social Media Data Biobank built on existing infrastructure. It would be a voluntary, opt-in platform where individuals donate anonymised digital behavioural data for mental health research. Hence, by drawing on existing advances in data security and portability, and by working through trusted global networks such as the Rotary Peace Centers, this framework could facilitate trust-building, cross-cultural participation, and the ethical scaling of genuinely representative mental health data.

Problem: 

There is a combined record of everything you interact with online such as the posts you like, the pages you follow, the content you engage with, your screen time, and your scrolling behaviour and each interaction leaves behind data that reveals what captures your attention, what influences your thinking, and how your opinions take shape (Keshavarz, 2019). This behavioural data is increasingly valuable for mental health research, particularly in understanding conditions such as addiction, body dysmorphia, depression, and anxiety. In many cases, it can complement or even outperform traditional self-report scales, or be most powerful when combined with them.[EG3] 

However, a significant epistemic bottleneck arises when research exploring social media use and digital engagement relies on small, narrowly defined participant samples drawn primarily from WEIRD populations (Pitesa & Gelfand, 2023). As Burns et al. (2019) note, a large proportion of neuroscience research is conducted on WEIRD samples, making it difficult to determine whether observed neuropsychological patterns are truly universal or instead culturally specific.

Why does this happen? The reasons are multiple such as convenience, limited funding, time constraints, and the structural difficulty of recruiting diverse global populations. While funding bodies sometimes require demographic diversity, these requirements are often weakly enforced or reduced to formal compliance rather than meaningful representativeness. As a result, even well-intentioned reforms leave significant gaps unresolved. This suggests that instead of only trying to fix sampling at the level of individual studies, we may need to rethink the underlying data infrastructure itself and explore alternative ways of diversifying the evidence base.

The Limitations of Existing Regional "Black Boxes"

Researchers have increasingly recognised the value of social media data, especially as they face growing pressure to produce evidence that can meaningfully inform policy and practice, while access to high-quality data remains limited. In response, promising initiatives have emerged, including the Smart Data Donation Service in the UK (SDR UK), and the Cambridge Digital Mental Health Group. However, these efforts still face significant limitations in scope, scale, and representativeness. 

For instance, SDR UK positions itself as the sole UK organisation making the full range of smart data safely accessible to the research community. It also presents itself as a national leader in smart data infrastructure, supported by multiple university-based data services and corporate partnerships. However, it is a platform focused on building an ecosystem across goverments, corperate partners, and academics so there are multiple audiences at once. Furthermore, though trust is emphasized, participants who are likely the target customer, cannot monitor how their data is used, or assess the exact studies that are being referenced on the site. The ecosystem is also regionally fragmented, anchored to UK universities and corporate agreements, which constrains both demographic diversity and global applicability.

Moreover, in terms of the Digital Mental Health Group Lab at the University of Cambridge, the initiative primarily provides a window into an active research lab rather than a public-facing platform for data donation or large-scale engagement. Furthermore, publicly available materials provide little detail on specific studies, datasets, or tangible outcomes. There is no mechanism for individuals to contribute data directly or track research progress. Nonetheless, the group’s work demonstrates that research is happening and highlights important questions in digital mental health, but it remains a lab-based effort. 

A Bold Proposal for a Global Social Media Data Biobank Centering the Global South:

Although, in principle, one could work more closely with existing funding streams to encourage stricter standards and better enforcement, this approach would require coordinating across thousands of funders, and while potentially valuable in the long term, it is unlikely to fully eliminate persistent loopholes and inconsistencies. Alternatively, one could partner with existing initiatives discussed earlier in this article and help them address current limitations. However, this proposal is explicitly motivated by a broader gap that none of these efforts currently resolve: a truly global, representative infrastructure for social media data donation and mental health research.

Therefore, this proposal envisions the creation of a global, opt-in Social Media Data Biobank, designed as a secure, public-interest infrastructure where individuals can voluntarily donate anonymized digital behavioral traces for independent scientific research. Unlike existing UK initiatives, which are largely regional, fragmented, or lab-focused, this Biobank is conceived as truly global, inclusive, and participatory, addressing gaps in transparency, scale, and public engagement.

A distinctive feature of the initiative is its integration with the Rotary Peace Centers network, which spans seven international hubs across Asia, Africa, the Americas, and Europe. Established in 2002, the Rotary Peace Centers have trained over 1,800 fellows from more than 115 countries, providing fully funded academic fellowships in peacebuilding, conflict resolution, and development at premier universities worldwide (Hazlehurst, 2010). Hence, by leveraging this long-standing, globally distributed ecosystem, this Biobank can access populations often excluded from digital research, producing datasets that reflect diverse cultures, socioeconomic contexts, and digital practices.

Importantly, the platform would aim to support two pathways for data contribution. First, individuals can opt in to donate personal digital traces, while researchers affiliated with the Rotary Peace Centers conducting fieldwork can upload anonymized datasets collected in their studies. The Biobank will also feature a continuously updated, transparent roster of ongoing research, showing which studies are active in each region, what questions they address, and the outcomes generated. This approach helps mitigate the WEIRD bias, provides longitudinal continuity, and builds a globally representative research resource.

This proposed Biobank would prioritize participant and contributor empowerment, enabling donors and researchers to have clear visibility into how their data is used, which studies draw on their contributions, and also have access to research outputs. This approach would also contrast with existing platforms that operate largely under corporate agreements or lab-specific agendas, where public and researcher influence is limited and datasets are often restricted to narrow, unrepresentative populations.

Evaluated through the ITN framework, this Biobank would be a tractable, large-scale intervention. It would provide global, diverse, and transparent data, supporting research that informs policy, digital mental health, and other pressing challenges in ways current regional or lab-based initiatives cannot.

Proposal Limitations:

This proposal is not without limitations, as is the case with any large-scale global initiative. Questions of privacy, consent, and cross-cultural data governance remain inherently complex, even within a voluntary opt-in framework, particularly when dealing with sensitive behavioural data across different legal and cultural contexts. In addition, there is a persistent risk that participation would still skew toward more digitally connected and higher-income populations, despite efforts to broaden reach through networks such as the Rotary Peace Centers. Furthermore, practical challenges also remain around integrating and standardising data across platforms and countries, and the model ultimately depends on varying degrees of cooperation from private technology companies, whose incentives may not always align with open scientific infrastructure.

However, these limitations are not ignored but partially mitigated through the design of the Biobank itself. By embedding transparency into the system, prioritising clear opt-in consent, leveraging established global networks to improve representativeness, and using a flexible infrastructure capable of accommodating heterogeneous data sources, the proposal aims to reduce, though not fully eliminate, these structural challenges.

The ITN Framework: Importance, Tractability and Neglectedness: 

To evaluate the impact of this proposal, we can apply the core EA framework of Importance, Neglectedness, and Tractability to determine if this is a high-impact cause area.

Importance: 

Improving the epistemic infrastructure of behavioural science is highly important because digital behaviour shapes mental health, cooperation, and responses to global risks at population scale (Ghani et al, 2019). In EA terms, small improvements in the quality of evidence can substantially increase the impact achieved per unit of funding. 

Tractability: 

This intervention is tractable because the core technical and legal components already exist. Secure storage, anonymization, and controlled access infrastructures are well established in medical biobanks. In parallel, data portability rights in many jurisdictions allow individuals to obtain copies of their personal digital records, creating a feasible pathway for voluntary data donation. Hence, the proposed Social Media Biobank builds on these foundations but extends them globally.

Neglectedness: 

This issues is neglected because responsibility for building global behavioural data infrastructure falls between institutional mandates. Governments often lack jurisdiction over privately held platform data, companies face few incentives to enable independent access beyond compliance requirements, and academic funding structures typically reward novel findings rather than the slow, collective work of building shared datasets. 

A New Paradigm for Doing Good?

In conclusion, investing in a Global Social Media Data Biobank is an investment in better knowledge. This initiative cultivates a commitment to orienting toward truth and updating our views based on empirical evidence rather than defending existing ideas. Therefore, just as we have normalized the donation of money, time, blood, and organs to help others, donating digital data for public-interest research may soon feel just as natural. Therefore, by focusing on high-leverage infrastructure and acknowledging the trade-offs of limited resources, this proposal ensures that our efforts to help others are grounded in evidence that is both representative and actionable, transforming everyday social media activity (e.g. every like, share, or post) into a force for collective insight and global impact.

 

* This post was published, as part of the Effective Altruism Cambridge Project-Based Fellowship. Learn more here: https://www.eacambridge.org/

 

Works Cited

Boudlaie, H., Nargesian, A., & Keshavarz Nik, B. (2019). Digital footprint in Web 3.0: Social media usage in recruitment. AD-minister, 139–156.

Burns, S. M., Barnes, L. N., McCulloh, I. A., Dagher, M. M., Falk, E. B., Storey, J. D., & Lieberman, M. D. (2019). Making social neuroscience less WEIRD: Using fNIRS to measure neural signatures of persuasive influence in a Middle East participant sample. Journal of Personality and Social Psychology, 116, e1–e11. https://doi.org/10.1037/pspa0000144

Ghani, N. A., Hamid, S., Hashem, I. A. T., & Ahmed, E. (2019). Social media big data analytics: A survey. Computers in Human Behavior, 101, 417–428.

Hazlehurst, D. (2010). A brief history of the Rotary Centres for International Studies Program. Journal of Conflictology.

Hargrove TW, Halpern CT, Gaydosh L, Hussey JM, Whitsel EA, Dole N, Hummer RA, Harris KM. Race/Ethnicity, Gender, and Trajectories of Depressive Symptoms Across Early- and Mid-Life Among the Add Health Cohort. J Racial Ethn Health Disparities. (2020), p. 619-629. doi: 10.1007/s40615-019-00692-8. 

Pitesa, M., & Gelfand, M. J. (2023). Going beyond Western, educated, industrialized, rich, and democratic (WEIRD) samples and problems in organizational research. American Psychological Association. 

4

0
0
1

Reactions

0
0
1

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities