Note: This is my first-ever post, and I would like to share my work and am happy to hear your constructive feedback.
Part 1: Global AI Dataset on AI Impacts
I finished my PhD in (Quantitative) Sociology in February 2025. Since then, I have been transitioning into an AI safety researcher. The transition has not been as smooth as my once-romanticised thoughts would have assumed. There have been a few—and a growing number of—AI projects I would like to design, experiment with, and build. I spent months applying for funding within the Effective Altruism community with the belief that I would begin my proposed AI projects after securing funding. It turned out that with no prior work experience or portfolio of work on AI other than my academic research papers, I quickly learnt that receiving funding rejection after rejection is just part of the game. Instead of endlessly waiting for funding to come, I simply started building my AI projects. One of the projects, and the one this post will focus on, involves compiling and documenting a global AI dataset as an open-access data infrastructure that facilitates researchers and professionals worldwide in conducting AI research by using my centralised data source.
- My published dataset (Wave 1 (Version 1) on Harvard Dataverse:
Hung, J. (2025). "Global Artificial Intelligence Indicator Database (GAID), 1998–2025", https://doi.org/10.7910/DVN/QYLYSA, Harvard Dataverse, V1, UNF:6:AmkTCFMfV271VIT5pWC3ug== [fileUNF].
The Wave 1 (and Version 1) of my compiled and documented global AI dataset is sourced from all public-access raw data from Stanford's AI Index, the Global Index on Responsible AI (GIRAI), and the API-streamed OECD.ai (AI Policy Observatory). Below is a brief summary of my dataset:
- Dataset: MASTER_AI_DATA_COMPILATION_FINAL.csv
- Total Rows: 251,676
- Total Data Points: 2,372,436
- Total Unique Metrics: 24,323
- Year Range: 1998–2025
- Unique Countries: 214
- ISO3 Coverage: 100.0%
- Version: 8.28 (Reader-Friendly Optimisation)
- Last Updated: 26th December 2025
- Domains: Diversity, Economy, Education, Global AI Vibrancy Tool, Policy and Governance, Public Opinion, Research and Development, Responsible AI, Other(s)
I expect this to be an ongoing, yearly project (i.e., Wave 1, Wave 2, Wave 3, etc.) where I plan to update the dataset (.csv), Python script (.py), and codebook (.pdf) by programmatically extracting the latest annual data at the end of every year. Currently, I am working on a Wave 1 (Version 2) compiled and documented dataset by web scraping and/or extracting extra data from eight additional databases/websites, as follows:
- MacroPolo Global AI Talent Tracker: AI talent migration, university rankings, and publication trends by country (to understand human capital behind AI development).
- UNESCO Global AI Ethics and Governance Observatory: AI ethics and governance (e.g., ethical and social dimensions of AI).
- IEA's Energy and AI Observatory: AI's energy consumption and its impact on the energy sector.
- Epoch AI: AI models' compute and training trends (e.g., technical progress of AI models as the supply side of AI data).
- Tortoise Media - The Global AI Index: Commercial and innovative competitiveness (pillars: infrastructure (e.g., electricity, internet), operating environment (e.g., public opinion, data privacy laws), commercial (number of start-ups, private funding, etc.)).
- WIPO (World Intellectual Property Organisation) - AI Patent Landscapes: How many AI-related patents are filed per country by sub-fields (e.g., computer vision, NLP, etc.).
- Coursera - Global Skills Report (AI & Digital Skills): AI skills, machine learning, and data science.
- World Bank - GovTech Maturity Index (GTMI): E.g., Digital Citizen Engagement Index and GovTech Enablers Index.
As I have been tied up with some DeSci work, research paper writing, and PI grant application commitments, as well as having a deadline to submit the full manuscript of my latest book to the acquisition editor all within these 2–3 weeks, I expect to delay finishing the compilation (cleaning, standardising, and structuring) and documentation of the Wave 1 (Version 2) dataset for deposit on Harvard Dataverse until mid-to-late January 2026.
Part 2: AI Indices for AI Safety and X-Risk Mitigation
As time goes on, I have more ideas (and questions) about how to expand the impact of my project to fully align with my research interests. As mentioned above, since February 2025, I have been endeavouring to transition into an AI safety research career path. It is awesome to work on AI societal impacts—an area addressed by Wave 1 (Version 1) of my compiled and documented global AI dataset and some of my recently published academic research papers.
That being said, however, to expand the direction of my dataset project from AI societal impacts to AI safety alignment, measurement, and evaluation, I am planning to create different AI metrics based on Wave 1 (Version 2) of my dataset. Upon several nights of thoughts, I realised that my dataset project includes data from (1) GIRAI, (2) UNESCO Global AI Ethics and Governance Observatory, and (3) OECD.ai. This means my dataset is not only a compilation of data on AI impacts across domains but also contains many valuable structured panel data on (a) responsible AI, (b) human rights and AI, (c) AI ethics, and (d) AI policy and governance. Using these data, along with other national-level data on AI impacts across domains, I am able to build different index scores for all available countries.
Table 1: Some Early/Rough Thoughts
| Category | Data Source | X-Risk Measurement (The Index) |
| Governance Gap | National AI Policies & Regulations | The Alignment Preparedness Index: I can measure the change between a country’s compute power and its safety-specific legal frameworks. |
| Transparency | Explainability & Open-Sourcing | The Black-Box Concentration Index: I can track how much global AI capability is shifting into closed-source, un-auditable "black-box" systems. |
| Compute Control | Technical Standards & Procurement | The Proliferation Risk Index: E.g., I can map the global spread of hardware capable of training frontier models without safety oversight. |
Table 1 contains early thoughts on some indices I could build based on the available data in my compiled and documented dataset. In two of my recent research papers, I designed a composite AI data infrastructure index and a composite healthcare infrastructure index, respectively, following data normalisation. Therefore, I do have sufficient experience and familiarity with designing and building composite indices.
I recently built a web app named AI in Society featuring non-paywalled articles on AI impacts, an AI safety opportunities board, and the highest-rated and bestselling books on AI. Once Wave 1 (Version 2) of my dataset (along with all indices) is built and deposited on Harvard Dataverse, I plan to add a new page to AI in Society that serves as an interactive dashboard for the general public, researchers, and professionals to view all country-level AI safety index scores (each with a description to justify the score).
As a quantitative social data scientist of 11 years (both training and practising), I have used many national panel datasets where the research teams built publicly accessible websites solely to store and clarify their multiple-wave panel dataset projects. I intend to be more creative in data presentation, visualisation and dissemination to the public by building an interactive dashboard rather than a static webpage/website to host my dataset project.
As always, once I finish each stage of any of my projects (either AI-related or not), I have more ideas and questions about the scale-up phases. So far, these are the ideas I have come up with as of the publishing of this post. I would definitely like to finish compiling and documenting Wave 1 (Version 2) of my dataset project and build an interactive dashboard for hosting as soon as possible. However, as mentioned, given my other deadlines and commitments, it is expected that Wave 1 (Version 2) will be deposited on Harvard Dataverse by the end of January 2026, and the interactive dashboard on my site, AI in Society, will be delivered by mid-February 2026 at the earliest. Note: this expected timeline is contingent on my availability over the coming weeks.
This is my first-ever post (cross-posted) on LessWrong, AI Alignment Forum, and EA Forum. I hope you all enjoy reading about my completed and ongoing work. If you have any constructive feedback, please feel free to share it; however, I apologise in advance that I may only check the comments periodically (every few days if I am overly occupied by other deadlines). Also, since this is my first-ever post, please feel free to comment if you know of any other platforms where I should share this work-in-progress idea.
And a Happy New Year to everyone.

Executive summary: The author presents an open-access global AI dataset aggregating multiple major AI indicators and outlines plans to extend it into country-level AI safety and existential risk indices, with public dissemination via an interactive dashboard.
Key points:
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.