Hide table of contents

Existing Data and Research Problems

 

Since November 2025, I have been building a periodically updated global panel dataset on artificial intelligence (AI). As a quantitative social and health data scientist and applied policy researcher who is transitioning into AI safety and AI societal impact research, I was disappointed by the fact that global panel data on AI are scattered. Without centralised global panel data on AI, researchers and data scientists are discouraged from easily accessing comprehensive AI datasets for research. As currently constructed, different institutions publish their global AI data reports or datasets on their websites for public downloads, with some additional organisations presenting their internal AI data on interactive dashboards without allowing for public downloads. I know that I can do something about it—to make global panel data on AI more centralised, standardised, and curated and ready for public access and download.

 

The other issue I have realised since the beginning of 2025 is the lack of non-academic and non-paywalled publications that exclusively address AI in society. While some academic publications exclusively address AI in society, such as Oxford Intersections: AI in Society and AI & SOCIETY, we are unable to find non-paywalled equivalents outside academia. Therefore, since November 2025, I have decided to build my own site that exclusively presents non-academic and non-paywalled articles on AI societal impacts to both the professional AI safety research community and the general public.

 

The Global AI Dataset (GAID) Project

 

By the end of December 2025, I had very little clue about what addressing the above two data and research problems would lead my work to. All I was aware of was that once the above data and research gaps had been addressed, I should, sooner or later, have a clearer picture of how I should scale up my work. In December 2025, in the midst of software-engineering a web app that hosts non-academic and non-paywalled articles on AI societal impact and data-engineering my version 1 of a global panel dataset on AI, I decided to dub the entirety of my work the Global AI Dataset (GAID) Project. In this article, I would like to present what the GAID Project is about and how it has, as of writing this post, been designed as milestone-based. I am going to explain all milestones (or phases) of the GAID Project that I have already completed, as well as those that I am planning to develop and deliver in the coming months.

 

On Harvard Dataverse—a free, open-source, web-based repository, managed by the Institute for Quantitative Social Science (IQSS) at Harvard University—I describe the GAID Project (https://dataverse.harvard.edu/dataverse/gaidproject) as a comprehensive, longitudinal research repository designed to track the multi-dimensional evolution of AI across over 200 countries and territories. I explain that the GAID aims to bridge the data gap between fragmented raw data and high-integrity academic research, by unifying, centralising, curating, and standardising global panel data on AI to allow researchers, data scientists, and policy professionals to observe the global trajectory of the AI revolution.

 

While such a description on Harvard Dataverse outlines the main purpose of my work, the GAID Project, as currently planned, goes way beyond global panel AI data curation, compilation, and documentation. I would like to take the opportunity of writing this article to explain each of Phases 0–4 of my GAID Project, where I have already completed and delivered the outputs of Phases 0–2, and I have decided to spend the next months working on Phases 3–4.

 

Phase 0: Building a Web App, AI in Society

 

I engineered this web app, AI in Society (https://aiinsocietyhub.com/), in December 2025 for multiple reasons. One of the primary reasons, as indicated at the beginning of this article, is that I have learnt that there is a lack of non-academic and non-paywalled publications that exclusively address AI in society. To give some background about myself, I have 10 years of training (PhD, MSc, and BA) in quantitative sociology and social epidemiology. In addition to quantitative sociology and epidemiology, I have been trained in economics (especially in relation to the relationships between human capital and labour market participation), geopolitics (especially on China–Hong Kong and China–Southeast Asia relations), gender studies (with a specific focus on child sexual abuse, gender-based violence, gender inequalities, and women’s empowerment), human, international, and sustainable development (in alignment with the United Nations’ Sustainable Development Goals values), and public policy. In early 2025, when I was developing multiple original research papers on AI societal, economic, and geopolitical impacts (which are all published, as of writing this article), I realised two problems. The first problem was that, when searching for potential journal outlets for my work, there were only a handful of academic publications exclusively covering AI in society topics. As the influence of AI has been growing exponentially (I don’t have the data to back it up, but I reckon its influence has been growing at a much faster rate than that of social media and dot-coms decades ago), I believe there is an increasing need for non-academic researchers and the general public to gain access to data-driven, evidence-based, and narratively presented in-depth analysis on AI societal impacts.

 

The second problem was that I realised I had to inconveniently download datasets from multiple AI-focused databases, manually merge the datasets into one, and carry out econometric analysis for original research. Such a process was very time-consuming and human labour-intensive. In the AI era, the sexy terms are automation, efficiency, and productivity. The data analysis work for AI researchers and data scientists requiring heavy manual inputs, as of today, sounds user-unfriendly to some degree. Therefore, the initial design of the GAID Project was to address these two problems that I encountered.

 

Our world is at a stage, as of writing this article, where global AI players have been working on the societal and economic integration of AI technology. We have seen tech giants in the US, while chasing the continual boost of compute efficiency, emphasising more on how the ever-advancing AI technology can be translated into positive societal and economic returns over time. Alternatively, launched in 2024, China's AI Plus (AI+) initiative has been strategically focusing on its 10-year plan to fully integrate its advanced AI technology across as many industries as possible. The awareness of the importance of optimising societal and economic impacts, instead of solely prioritising the continual boost of compute efficiency to achieve artificial general intelligence (abbrev., AGI), supports my decision to build a site that addresses AI societal impacts to the wider audience.

 

Therefore, when engineering my web app, AI in Society, I decided to add an Articles section that features non-academic and non-paywalled articles on AI societal impacts. Given my data science expertise, I expect most of the articles shared periodically would be data-driven. Yet, when applicable, some articles might be theoretically or methodologically focused, for example. In addition to building an Articles section, I decided to build a curated opportunities board section on my web app. In the early 2020s, when I was still a PhD student, I already spent the majority of my time browsing the Internet to search for both pre-doctoral and postdoctoral fellowships and funding opportunities. As we know, scientists spend the majority of their time on grant searching and writing rather than on actual research, so finding eligible funding opportunities that fit our expertise is a big deal for us. For AI funding opportunities, while some established sites, such as EA Opportunities Board80,000 Job Board, and AISafety.com, constantly feature new AI-focused fellowships and grants, many opportunities are being overlooked or unfeatured on these sites. Therefore, I engineered my curated AI Opportunities Board page to share AI fellowships and funding opportunities that I am aware of but may or may not be featured on these established sites.

 

The other, and more important, reason why I engineered my web app, AI in Society, is that I believe there is a need for me to establish my own site to host the deliverables of my GAID Project. As I mentioned, my GAID Project is milestone-based (which means it is ever-scaling). Therefore, rather than hosting my GAID Project deliverables across different online platforms, it is much easier for me to build my own site so that any future deliverables of the GAID Project can be directly featured on the centralised web app, AI in Society.

 

Phases 1–2: Compiling, Curating, and Documenting GAID Datasets

 

To further benefit the AI research community, between November and December 2025, I spent weeks data-engineering version 1 of the GAID dataset. As mentioned, it is very researcher-unfriendly to manually identify and download AI-focused datasets, followed by using any software package to merge them together and carrying out data cleaning and standardisation before analysis. I believe that as long as there can be a publicly-accessible global panel dataset covering AI across different domains, the time needed for researchers, data scientists, and policy teams to conduct AI research and evaluate AI impacts would be largely shortened. Therefore, in November 2025, I identified three arguably most comprehensive global AI databases, namely Stanford's AI Index, OECD.ai (AI Policy Observatory), and the Global Index on Responsible AI, and intended to compile, clean, standardise, and document their public-access data as a new dataset for public use.

 

I finished engineering and published the version 1 GAID dataset (https://doi.org/10.7910/DVN/QYLYSA) in late December 2025 on Harvard Dataverse. The version 1 GAID dataset is a longitudinal panel dataset providing a comprehensive and harmonised overview of the global AI landscape currently available. This is a curated, compiled, and documented dataset that covers 214 unique countries and territories, from 1998 to 2025, across AI in eight domains, including economy, policy, and governance. I underwent a total of 123 steps of clinical cleaning and deduplication to optimise the data integrity of this version 1 dataset. This dataset can easily and immediately be ingested in R, Stata, Python, and SPSS, for example, for statistical data analysis. For my GAID datasets, including this version 1, I strategically only include country-level data, which means regional data (such as Europe, Asia, etc.) or city- or state-level data (such as California, New York) are excluded. Please note that I refer to country-level data to any place with an official three-letter International Organisation for Standardisation (ISO3) identifier that is assigned to countries, dependent territories, and special areas worldwide. For example, Hong Kong has its own country-level ISO identifier, which is independent of China’s counterpart. Therefore, data from Hong Kong are included in my GAID datasets.

 

In total, my version 1 dataset has over 24,000 unique metrics. The definitions of these unique metrics can be found in my codebook, which was published along with the version 1 GAID dataset at https://doi.org/10.7910/DVN/QYLYSA. For researchers, data scientists, and policy teams who would like to use my version 1 GAID dataset for AI research, please feel free to read the corresponding 186-page codebook that details how the unique metrics are measured and defined.

 

Publishing my web app, AI in Society, and the version 1 dataset marked the completion of Phase 0 and Phase 1 of my milestone-based GAID Project, respectively. From 26th December 2025, I spent roughly the next three weeks data-engineering, documenting, and publishing the version 2 dataset (https://doi.org/10.7910/DVN/PUMGYU) on Harvard Dataverse, which is Phase 2 of my GAID Project. The version 2 dataset is a significant expansion or upgrade of version 1 of the longitudinal panel dataset. I integrated, standardised, and surgically cleaned high-fidelity AI indicators from eight additional premier AI databases and websites into my existing version 1 dataset. These eight additional data sources for the version 2 dataset are: (1) MacroPolo Global AI Talent Tracker, (2) UNESCO Global AI Ethics and Governance Observatory, (3) IEA's Energy and AI Observatory, (4) Epoch AI, (5) Tortoise Media - The Global AI Index, (6) WIPO (World Intellectual Property Organisation) - AI Patent Landscapes, (7) Coursera - Global Skills Report (AI & Digital Skills), and (8) World Bank - GovTech Maturity Index (GTMI).

 

Data collected from these eight additional sources was either by data ingestion or web-scraping, where applicable. Like version 1, the version 2 dataset is optimised for easy and immediate statistical data analysis on R, Stata, Python, and SPSS, for example. In this version 2 dataset, there are almost 26,000 unique metrics representing a total of 227 unique countries or territories (all with existing ISO3 codes) from 1998 to 2025 across 20 AI domains. Version 2 has almost 26,000 unique metrics, while version 1 has 24,000+ unique metrics; version 2 covers 227 unique countries and territories (all with specific ISO3 codes) and version 1 covers 214 unique countries/territories (all with specific ISO3 codes); and version 2 covers 20 AI domains, while version 1 covers eight AI domains. Therefore, version 2, overall speaking, is a far more comprehensive and in-depth global panel dataset on AI than version 1.

 

Version 2 was published in mid-January 2026 on Harvard Dataverse. Since version 2 is the most updated and comprehensive GAID dataset as of writing this article, I would recommend researchers, data scientists, and policy teams conducting AI research to use this version of the dataset for statistical data analysis. Also, please consult the accompanying 200-page codebook to understand how the unique metrics, across 20 domains, are measured and defined at https://doi.org/10.7910/DVN/PUMGYU when using the version 2 GAID dataset.

 

Phases 3–4: The Global AI Bias Audit—An Automated Evaluation and Interpretability Dashboard for Foundation Models and AI Agents

 

While building and finishing the version 2 GAID dataset, I have spent the past weeks designing the scale-up phases (i.e. Phases 3–4) of my GAID Project. I dub this scale-up project “The Global AI Bias Audit—An Automated Evaluation and Interpretability Dashboard for Foundation Models and AI Agents”. This scale-up project is designed to deliver the following two milestones:

 

  • Phase 3: Phase 3 involves engineering an interactive dashboard as a separate page hosted on my AI in Society web app, which interactively, dynamically, and programmatically shares national profiles and data visualisation of my version 2 GAID dataset.
  • Phase 4: Phase 4 involves stress-testing foundational AI models against my ground-truth GAID dataset for AI safety, fairness, and readiness, and to address digital colonisation in any AI-driven decision-making.

 

Description & Aims

 

This project aims to establish an automated AI Eval and interpretability dashboard built upon my GAID, 1998–2025, for which the wave 1, versions 1 and 2 were published on Harvard Dataverse, as discussed above. The dashboard will be hosted on my software-engineered web app, AI in Society. As of today, generative AI models lack a proactive validation mechanism to ensure their outputs are factually grounded and free from geographical bias. This project addresses such an interpretability gap by transforming the GAID into an automated benchmarking ecosystem for global AI researchers and policymakers. I built both my GAID dataset and my web app, AI in Society, via Python. I am updating the version 2 dataset to include composite AI indices (based on the GAID data) for global AI readiness, fairness, and safety. I will add code blocks to my Python scripts to design the interactive dashboard in a way that utilises the structure of GAID indices to programmatically audit foundation models, quantifying the discrepancy between model-generated assessments and my AI index scores.

 

This technical project has three objectives. The first objective is to develop a visual interpretability layer. I will refine my existing Python scripts to generate high-fidelity, interactive visualisations (e.g., geographical heatmaps and radar charts) of all 227 unique countries and territories across 20 GAID domains. Such an approach provides national profiles for all countries, serving as a visual ground truth against which large language models’ outputs can be compared. The second objective is to engineer an automated AI Eval pipeline. I will build a Python-based testing framework that programmatically evaluates the factual reliability of generative AI models via APIs. The engine will task generative AI models with estimating AI safety, fairness, and readiness for specific regions and will automatically calculate error metrics (e.g., mean absolute error) by comparing model responses to the GAID standards. The third objective is to quantify and explain geographical bias among generative AI models. I will launch the interactive dashboard that visualises model performance across different socioeconomic tiers, in order to see if the discrepancy between models' assessments and the ground-truth GAID data and index scores increases among less developed countries compared to their wealthy counterparts. By utilising interpretability techniques, this project will, furthermore, identify specific domains (e.g., energy, talent, ethics) where models consistently fail to align with the ground-truth data, exposing the systemic hallucinations that compromise AI safety, fairness, and readiness in the Global South and non-Western democratic societies.

 

Goals

 

The project is strategically designed to advance our collective AI safety progress by making generative models more reliable, trustworthy, and customised for real-world governance. This project directly addresses the research priority of Responsible AI by transitioning my recently completed work on static data curation to the scale-up phase of automated AI Eval and interpretation. This project aims to satisfy three goals:

 

  • Achieving responsible AI and bias mitigation: I will build a bias audit infrastructure that uses longitudinal ground-truth data from my GAID dataset to quantify geographical hallucinations. Such work provides a rigorous technical framework to identify and mitigate systemic inaccuracies in how foundation models represent the Global South.
  • Designing innovative methodology by engineering a Python-based AI Eval pipeline: I will provide a scalable methodology for testing the factual reasoning of AI agents. Instead of building simplistic benchmarks, this project aims to deliver an evaluation-as-a-service platform, where model reliability is continuously verified against the high-scale, cross-domain, and yearly-updated GAID dataset.
  • Offering transparency and interpretability: The interactive dashboard utilises the 20 domains of GAID ground-truth data to explain why a generative AI model fails, locating specific knowledge gaps in areas such as energy infrastructure or ethical governance.

 

Timeline and Deliverables

 

The feasibility of this project is facilitated by my published foundational work: the GAID wave 1, versions 1 and 2 dataset and the software-engineered AI in Society web app. I resolved the primary technical hurdles—data acquisition and cleaning and the development of the Python scripts for the completed work. Such delivered outputs indicate that the implementation of this scale-up project is realistic and focused on engineering rather than data collection. Below are the three milestones I aim to reach for this project:

 

  • Coming months 1–4: Developing the Python-based visualisation backend to transform existing structured, clean data and corresponding composite AI index scores into interactive national profiles.
  • Coming months 5–8: Implementing the AI Eval engine, leveraging my expertise in Python and API integration to automate the stress-testing of foundation models (such as Gemini, GPT-5, Claude).
  • Coming months 9–12: Large-scale auditing and bias reporting on my interactive dashboard, as well as in a technical report paper (published at arXiv) and conference presentation (NeurIPS 2027 conference).

 

Note: A new wave of data from all data sources will be programmatically extracted at the end of each year for the GAID dataset, so the interactive dashboard enjoys metadata updates periodically to deliver living benchmarks.

 

Impact Assessment

 

The primary impact is the establishment of a global standard for auditing the reliability of generative AI across different domains (e.g., policy and governance). By providing the first automated tool, hosted as an interactive dashboard that quantifies geographical hallucination, this project enables developers, technologically enabled researchers, and policymakers to identify where models fail the Global South or non-Western democratic societies, preventing digital colonialism in AI-driven decision-making.

 

This project provides a positive scientific impact by introducing new benchmarks for interpretability through a statistically rigorous system with 20 domain quantification of model errors. This project also offers positive economic and policy impact. The interactive dashboard facilitates governments and industry professionals to verify the safety, fairness, and readiness of AI agents before deployment in global markets. This project supports, for example, the UK’s leadership in AI safety by providing a diagnostic tool that ensures AI-driven monitoring is factually accurate and geographically inclusive. My project optimises sustainability, as the automated pipeline with annually updated global panel GAID data ensures the tool remains relevant as AI strategies evolve. I will open-source the AI Eval Python scripts to foster a collaborative ecosystem where developers and researchers can contribute to a more equitable global AI landscape.

 

Collaborations across Disciplines and Sectors

 

This scale-up project is interdisciplinary, bridging computational data science, AI interpretability, and international political economy and governance. The project fosters a cross-sectional collaboration between academia and intergovernmental monitoring bodies. As the version 2 GAID dataset was developed by ingesting and web-scraping data feeds from 11 authoritative sources like OECD.ai, WIPO, and UNESCO, the project connects academic benchmarking with the practical needs of global governance organisations. The AI Eval engine is designed to be used by developers to stress-test their generative AI models for factual accuracy and bias. Not only will I publish a technical report paper to unveil the white-box details of the Python-based AI Eval pipeline, but I will also disseminate outputs to demonstrate how the GAID ground truth can improve model fine-tuning for global applications (through a technical paper on arXiv, a conference presentation at the NeurIPS 2027 conference, and public posts on the Effective Altruism Forum, LessWrong, and the AI Alignment Forum). Furthermore, I will aim at facilitating knowledge exchange by sharing outputs with the wider AI safety research community within and beyond academia. This ensures the technical insights gained from the Python-based AI Eval pipeline lead to better-informed regulations and more reliable AI tools globally.

 

Responsible AI

 

This project aligns with the core AI safety principle of responsible AI through the design and implementation of the Python-based AI Eval pipeline. This project is designed to mitigate systemic bias. The automated bias audit infrastructure will be presented in a leaderboard as part of the interactive dashboard, which quantifies geographical hallucinations and identifies precisely where foundation models fail to accurately represent the Global South and non-Western democratic societies.

 

Also, my project goes beyond traditional benchmarking. Based on my 20-domain global panel country-level GAID dataset, this project will establish a scalable methodology for testing the factual interpretability and reasoning fidelity of AI agents against high-scale, cross-domain evidence (meaning data from my GAID dataset). By delivering a rigorous, evidence-based diagnostic tool, this project ensures that we can develop generative AI that is not only high-performing but factually grounded and geographically equitable.

 

Equality, Diversity, and Inclusion

 

First, the primary objective of this project is to bridge the interpretability gap that leaves non-Western societies vulnerable to biased AI-driven decision-making. By setting an objective to quantify geographical hallucination, the research design includes the 227 unique countries and territories from the GAID dataset as equal subjects of study, rather than focusing on the high-resource AI ecosystems of the Global North alone. Such an approach ensures that the technical definitions of AI safety, fairness, and readiness are inclusive of, for example, diverse socioeconomic circumstances, energy constraints, and policy and ethical frameworks found across the Global South.

 

Second, the methodology of the AI Eval pipeline is built to detect and expose systemic bias. The methodology for stress-testing foundation models and AI agents includes a stratified analysis across different socioeconomic tiers. This ensures that the evaluation of reasoning fidelity is not biased toward countries with high data density. Also, by integrating 20 domains (e.g. talent, ethics, energy, policy, and governance), the methodology acknowledges that AI readiness is intersectional. This project audits whether a model’s bias in metrics from a single domain is compounded by a lack of understanding of factors from other domains in developing countries. Furthermore, to promote inclusion within the developer and technologically enabled researcher community, the Python scripts for the AI Eval pipeline will be open-sourced. Such an approach allows developers, researchers, and policymakers from low-resource institutions to utilise high-level interpretability tools that are often locked behind proprietary paywalls.

 

Third, my GAID dataset itself is an exercise in inclusive data gathering with global representation. Unlike many AI benchmarks that only cover OECD countries, GAID programmatically synthesises data for 227 unique countries and territories globally. By ingesting data from 11 diverse global AI databases and websites, such as UNESCO and WIPO, this project ensures that the ground truth is not derived from a single Western perspective but from a collection of international monitoring bodies. Moreover, the global panel nature of the country-level data (1998–2025) of my GAID dataset allows for an inclusive understanding of how AI safety, fairness, and readiness have evolved differently across various regions over time, preventing researchers, developers, and policymakers who use my GAID dataset or the expected outputs of this project (i.e. the interactive dashboard) from ignoring the progress of emerging economies.

 

Fourth, the reporting phase of this project is designed to be accessible and transparent to a global audience of stakeholders. The interactive dashboard will be hosted at my non-paywalled AI in Society web app with high-fidelity visualisations like geographical heatmaps. This project ensures that expected outputs are interpretable for researchers and policymakers who may not have a technical or computational data science background. I software-engineered my web app as an interactive and non-paywalled site. Its design avoids any static presentation and visualisation, and optimises reader-friendliness.

 

Also, the expected outputs will be disseminated across both highly visible technical academic platforms (such as NeurIPS and arXiv) and public-facing forums for AI safety research communities (such as Effective Altruism Forum, LessWrong, and AI Alignment Forum) by the end of the scale-up project. This multi-tiered reporting ensures that insights into geographical bias reach both the technically enabled researchers and developers of AI agents, as well as the policy-making community.

 

To supplement, the final reporting will rank model performance by country-income tier (i.e. high-income countries, upper-middle-income countries, lower-middle-income countries, and low-income countries). Such a design creates a public record of which AI agents are failing non-Western democratic societies. This design acts as a diagnostic and corrective measure, providing the evidence needed to advocate for more equitable AI development and deployment.

 

To Wrap Up

 

So far, I have self-invested and self-funded to complete and deliver the outputs of Phases 0–2 of the GAID Project. This week, I began to submit some funding applications to support the scale-up project (Phases 3–4) of my work. Like everyone else working on responsible AI, AI safety, AI alignment, and related fields, I don’t have a clear answer to how my ever-scaling work can benefit humanity upfront. I just started my work, reached a baby milestone, scaled my project up, and have repeated the same process until I gain a clearer picture and a more solid footing on how to contribute to building positive, constructive, and responsible AI.

 

Over the past few weeks, I have also thought about Phase 5 of my GAID Project. As of writing this article, Phase 5 is still at the concept stage. My concept is that Phase 5 will be building an AI for forecasting model based on my ground-truth GAID data and yearly updated GAID dataset(s), and will ideally be hosted on my AI in Society web app too. I will continue to consolidate my thought process and come up with a more concrete design of Phase 5 of my GAID Project, while working on Phases 3–4.

1

0
0

Reactions

0
0

More posts like this

Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities