Hide table of contents

I work at Our World in Data, where we try to make research and data on the world's largest problems more accessible and understandable.

I attended EA Global this past weekend, where I received very interesting input from many lovely people on potential improvements. But I thought it'd also be worth asking here to get wider feedback. I'm interested in all the following:

  • Low-hanging 'data fruits': simple datasets or charts that you know to be readily available somewhere and that would add significant value, but that aren't already listed here.

  • High-hanging fruits: things we could add to the website in the medium term with a lot more work (new subjects, larger datasets, data that needs a lot of cleaning, etc.)

  • Imaginary fruits: what you'd like to see on OWID in your wildest dreams (e.g. global population projections to the year 10,000 under various scenarios).

Thank you!

New Answer
New Comment


30 Answers sorted by

Embed forecasts in your pages.

Work with metaculus to have forecasts on the future as well as past values.

It would be useful for animal advocates to have your figures for population sizes in terms of numbers of individual animals, not just weight. I'm thinking pages like:

https://ourworldindata.org/mammals

https://ourworldindata.org/birds

https://ourworldindata.org/fish-and-overfishing

Pages for invertebrates (wild and farmed) would be nice, too!

+1 As well. I would emphasize that number of animal alive at any given time is significantly more important than slaughter as many animals die prior to slaughter.

+1 to reporting numbers of animals instead of tonnage or biomass. The OWID meat and dairy production page does have a "numbers of animals slaughtered" section, so it would be great for that to be expanded both to other large numbers of animals (like various fish species, crustaceans, invertebrates) and beyond slaughter (such as alive at any one time).

Here are some articles with sources of such data. I haven't really looked into how hard they would be to maintain and update. It is biased towards data collected by Rethink Priorities staff because it was the easiest I had to hand, but hopefully others can add anything major I missed:

... (read more)

Collaborate with Jaime Sevilla on datasets for various values related to size, performance, training expense, etc. of large machine learning models. 

Having high quality data on this which one knows is going to be maintained makes it much easier to elicit forecasts about these topics, and eventually resolve those forecasts and keep track of track-records, and I know that Jaime has been working on this.

We now have a first chart based on their pre-print here: Estimated computation used in large training runs of AI systems

1
NunoSempere
Wohoo, nice!

I would really like to see this!

Create a page on biological weapons. This could include, for instance,

  1. An overview of offensive BW programs over time (when they were started, stopped, funding, staffing, etc.; perhaps with a separate section on the Soviet BW program)
  2. An overview of different international treaties relating to BW, including timelines and membership over time (i.e., the Geneva Protocol, the Biological Weapons Convention (BWC), Australia Group, UN Security Council Resolution 1540)
  3. Submissions of Confidence-Building Measures in the BWC over time (including as a percentage of the # of BWC States Parties and split in publicly-accessible and restricted-access) 
  4. A graph that visually compares the funding and # of staff in international organizations for the bioweapons regime compared to chemical and nuclear weapons (e.g., the BWC Implentation Support Unit compared to OPCW for chemical and the IAEA and CTBTO PrepCom for nuclear)
  5. (Perhaps include an overview on the global proliferation of high-biosafety labs, e.g. see Global Biolabs)
  6. (Perhaps include a section on how technological advancements may affect the BW threat, e.g., include a graph on the Carlson curve (Moore's law but for DNA sequencing))

(This does sound useful, though I'd note this is also a relatively sensitive area and OWID are - thankfully! - a quite prominent site, so OWID may wish to check in with global catastrophic biorisk researchers regarding whether anything they'd intend to include on such a page might be best left out.)

Not much yet, but on (5) we now have this world map: Number of biosafety level 4 facilities

Datasets on philanthropic funding.

How much are big donors giving where? Can it be easy for there to be a searchable database of projects.

I like this idea. 

I tried to figure out how much funding there is for nuclear risk stuff, but it seems like the best source of data is the Peace and Security Funding Map tracking of spending on "Preventing and Mitigating Conflict > Nuclear Issues" (select "List"), and many grants that tracks really aren't about nuclear weapons issues but just happen to use the term "nuclear". These include medical research grants that use the term "nuclear" in a totally different sense and biosecurity-related grants to the Nuclear Threat Initiative. They also don't... (read more)

Just want to say as an EA researcher your website is an absolute godsend.

Glad to hear that :)

Thanks for making this post! I think OWID is excellent, and I'm really excited that you're interested in making it even more useful to the EA community.

One thing I'd note: I expect OWID might be able to get funding from the EA community for high-impact moves, if such funding would be helpful. See List of EA funding opportunities and Things I often tell people about applying to EA Funds

(Let me know if there's a way I can be helpful in working out what funders might be most relevant, what sort of funding proposals might make sense, etc.)

I know you've been campaigning for open access climate data - I'd be really excited for that. My policy team found your work on access to electricity pretty interesting.

A few things that jump to mind:

  • Data on the development of EA-related fields (e.g. growth of AI safety/alignment as an academic discipline, including things like funding, number of publications, number of faculty/graduate students, etc.)
  • Data on the history of philanthropy (e.g. how much have private philanthropists spent over the years, and on what?)

Data on housing and transportation:

  • Housing costs and affordability measures in various countries and metropolitan areas
  • The costs to build transportation infrastructure in various countries and metropolitan areas (available at transitcosts.com), preferably split by mode of transportation
  • Data on the economic, environmental, and health benefits (or drawbacks) of transit-oriented development and denser housing
  • Data on the mix of transportation modes used in various metropolitan areas

Info on nuclear yields

(This is a bit niche, but might also be quick to produce and could fit into the existing nuclear weapons article.)

As far as I’m aware, there’s no compilation of information related to the yields of various state’s arsenals that rivals the compilations of information on numbers of warheads created by the people such as the Federation of American Scientists. (Though I didn’t look very hard, so please tell me if I’m wrong! In any case, OWID-style visualiations would be handy too.)

I think that creating such a compilation would be at least somewhat useful (though I’m not sure how useful), e.g. for forecasting future changes. 

More specifically, I’d like to be able to easily find answers to questions like:

  • What is the current total yield of global nuclear arsenals?
  • What is the largest yield warhead currently in any state’s arsenal?
  • What is the current total yield of various specific countries’ nuclear arsenals (especially Russia and the US)?
  • What is the largest yield warhead various specific countries’ currently have?
  • What is the mean and median yield of warheads in their arsenals?
  • How have each of those things changed over time?

I expect I could find answers or form educated guesses on each of those questions with some work, but it’d be nice to have the info compiled and organised already.

Scraps of info that I happen to already have include:

  • The Nuclear Notebook column includes information on how many nuclear weapons of various types each country is estimated to have and what these weapons’ estimated yields are
    • Someone could run the simple calculations - inputting assumptions and estimated guesses where required, e.g. where Nuclear Notebook decline to estimate a particular number - to gather info relevant to the above questions
      • Including adding things up across countries
    • Someone could also look at previous editions of Nuclear Notebook to see how these things have changed over time (though I think only over the span of a few years)
  • The Nuclear Notebook column includes information on when various countries’ arsenal size peaked and when their total yield peaked
    • If I recall correctly, total yield always peaks earlier
    • I could share my notes on this if people are interested
    • Perhaps one could consult Nuclear Notebook’s sources for these claims and thereby compile datasets and graphs relevant to the above questions?
  • Turco et al. wrote in 1983 that “A review of the world's nuclear arsenals (20-24) shows that the primary strategic and theater weapons amount to ~12,000 megatons (MT) of yield carried by ~17,000 warheads.”
    • That implies a mean yield at that time of 700kt

(I originally proposed people on Metaculus create this, but I expect that won't happen and OWID seems ideally suited for this.)

Thanks for asking!

On some of your graphs, eg https://ourworldindata.org/grapher/gdp-per-capita-maddison-2020, you have a box you can tick to get "relative change". On other graphs, eg https://ourworldindata.org/grapher/children-per-woman-un?tab=chart&time=1950..2015&country=OWID_WRL~HUN, you don't have that box. You can force the chart to do this by adding "?stackMode=relative" to the URL, but that is annoying and hard to remember. Please add the box to all graphs.

If you generate a graph like https://ourworldindata.org/grapher/children-per-woman-un?tab=chart&time=2008..2015&country=HUN~AUT~CZE~SVK~POL~UKR~HRV~SRB , it's hard to see what's going on, because all of the action is crammed into a tiny part of the graph - in this case between 1.3 and 1.6 children. I would be interested in either having it autozoom to the part where things are happening, or at least have an option to zoom into that part. Maybe this already exists and I am just missing it.

Another thing that would be neat (though a lot of work for maybe not much gain) would be the ability to graph algorithms, eg the fertility rate of Hungary minus the fertility rate of Austria, over time.

Civilizational collapse data

  • Lifespans of civilizations/societies across history
  • How this differs based on various factors (e.g., maybe societies used to last longer or for less time? maybe it differs by region, by how many other societies border them, or by stage of technological development?)
  • "cause of death" (e.g., invasions, internal breakdown, ecological issues, ...)

Anders Sandberg at FHI looked into this to some extent (see a talk here and slides here). I could connect you with him if that'd be helpful.

Some other global catastrophic risk researchers may also have looked into this somewhat, e.g. perhaps Luke Kemp, Luisa Rodriguez, Haydn Belfield, or Karim Jebari. Again, I could probably provide intros if helpful.

I imagine various people outside the global catastrophic risk community have also looked into this somewhat, e.g. Jared Diamond and Peter Turchin.

This has been recently brought up again, alongside individual species extinctions.

Funding by different cause. Cancer, heart disease deworming etc.

Here's a few things I'd like to see —

First, working with Hamish Huggard to port over some of the data from effectivealtruismdata.com (as Nathan Young suggests). In particular, I think it would be useful to have a better impression of how EA and EA-adjacent philanthropic money gets spent.

Second, some charts covering long-run trends, such as: GDP over time starting around 0AD, world temperature or CO2 concentration since the end of the last ice age, agricultural production over time, energy consumption per capita, or population over time over millennia (sorry if you already have some of this). Obviously (with the exception of the climate stuff) the data is very sparse on this, but I am pro "here's a reasonable guess and here's how wide our uncertainties are" over "we're not entirely sure so we're going to say nothing". And I trust OWID can do an excellent job at communicating the uncertainties and interpretational difficulties involved.

Third, maybe a page on space. For instance, number of launches over time, number of satellites in orbit over time, amount and size distribution on space debris, space debris incidents over time, cost of launching a kilogram into orbit over time. In particular, both the UN and the ESA have really detailed datasets for potential launches over time / objects in space graphs, but the UN one doesn't have have an API so needs to be scraped, and I haven't seen people present either datasets in an accessible way.

Update: we now have a couple of charts on outer space objects

  • Cumulative number of objects launched into outer space (line, bar, map)
  • Yearly number of objects launched into outer space (line, bar)
0
finm
Thanks so much!

It would be nice to have more datasets that try to go way back before 1800 -- like those used by this OpenPhil report, or books like "Why The West Rules", "Secular Cycles", etc. Here is a link to a pdf will all the figures in "Why The West Rules", albeit they are mostly maps. I like graphs 3.1, 3.7, and 9.3.

As a stretch goal, once you have a bunch of super-long-ago data, it would be sweet to be able to graph the data not just linearly in time, but also along various warped scales so that instead of equal intervals representing equal years, equal intervals represent:

Size of different communities. 

I think it's underrated how much identification affects decisionmaking. I'd love some graphs of the change of people who identify with different monickers over time.

eg:
- socialist, weeb, christian, protestant, goth, EA etc etc etc

Hi Ed!

One thing that falls potentially into all three categories of difficulty is food stocks/reserves, which is an issue with high relevance to exposure to shocks and food insecurity, but really hard to track. 

It's a tricky issue, but could really help many researchers inside and outside of EA to improve!

A few issues we have found which would be very useful to see developed are:

The USDA PSD and FAOSTAT both have estimates for crop year end, but as crop years do not line up effective stocks are higher than this figure. These results are based on a few methodologies, but do not match reality exactly, and are better for globally traded crops. 

Reconcilliation and improving on these estimates is possible, but requires detailed trade data or insider data, which is very commercially sensitive often. Big traders (ABCD companies/COFCO) would know this, but they do not disclose.

Stocks can be reasonably accurate at a global level and when averaged over a period of time, however fluctuations in demand, smuggling and delays in data releases mean they can be hard to track on a country by country basis for the poorest. These are often the countries we care most about for food insecurity.

Stocks in strategic reserves, private reserves and simply in transit can be difficult to divide out. In some cases this suggests chunks of available stocks are missing, or that stocks would not be available to the market if classed as "private" when actually state controlled.

How much negative carbon (carbon offsets) are being produced at different accreditation standards. IE carbon offsets always have to be accredited so it's not just a question of how many, but how much different orgs count them as.

Maybe just take Carbon Plan data and put it in an easier to read form.

Can you clarify what this means, for the benefit of all forum readers? I figured it has something to do with carbon offsets.

-2
Nathan Young
Thanks for asking.

I think it would be very beneficial to take advantage of a complementary software like Anki.

I'm wondering if it would be useful to track data on national legislatures (or maybe just heads of state) worldwide? This could include:

  • Demographic information on politicians around the world (ie. educational backgrounds, political orientations, religious identities, genders, races, ages)
  • The electoral history of politicians around the world
  • If possible, the voting history of politicians (or legislatures as a whole) on cause areas relevant to EA

I'm not sure how feasible this is, but I imagine it could help EAs think more concretely about where they're likely to find support for different advocacy efforts.

Better tools for simple comparisons of different datasets and generating custom charts. For example, there have been a number of times when I wanted per-capita data but could only find charts for total, or vice versa. (This should be a low-priority request since it's primarily a convenience issue.)

I previously tried to think of "active grantmaking" ideas (here), i.e. things I might want EA funders to fund but for which no application has yet been made. Some of these were people/orgs I thought do cool work and so might be worth funding for something, and some of these were potentially cool ideas I might want some person/org to do. 

Here's one of the (rough and vague!) ideas I had:

Fund OWID for something

  • What sort of things might I want them to use money for?
    • Just expand/scale in general?
    • Do something analogous to how Vox made a new “vertical” for Future Perfect?
      • Like a new department or focus area
      • Sketch of what this could look like:
        • One of the buttons on the bar at the top of the OWID that says the name of some broad topic area relevant to EA, or something vaguer like Future Perfect
        • The stuff in that area is more EA-relevant than average, and has a similar theme or angle or something. e.g., maybe it's all focused on things relevant to x-risks
        • At least one OWID staff member is primarily focused on producing that sort of content.
        • It's still the same sort of content as OWID's regular stuff.
      • E.g., they don't have a finished page on nuclear weapons, and I don't think they have ones on bioweapons or AI. I want them to have that. 
      • We could either ask them to make those things specifically, or ask them to set up something like how Future Perfect works within Vox that will regularly produce that sort of thing.
    • Do work on specific topics?
  • Open questions:
    • What sorts of restricted funding, advice, or encouragement would they be open to?
      • On the 80k podcast, Roser indicated they much preferred people to give OWID unrestricted funding and let OWID use their own judgement
        • And I got the impression that maybe in general they might not be open to restricted funding
        • But maybe they'd be more open to it from EA sources when we do have a really good rationale and it roughly aligns with OWID’s own vision?
    • Is this better than trying to facilitate the creation of new things kind-of like our world in data?

Another idea I had was funding the creation of "new things kind-of like Our World in Data". (This is discussed briefly in this comment thread.) 

I think the key bottlenecks to this are (1) a clearer sense of precisely what kind of org/project we'd want and (2) people who are willing and suited to making that happen.

I guess OWID could help facilitate that by suggesting types of OWID-like projects that would be great but for whatever reason might be better done elsewhere instead/as well, suggesting people who might be great at doing that, and/or agreeing... (read more)

Automated Local Regression Discontinuity Design Discovery

Automated discovery of outliers in multicountry datasets (i.e. where you can see where your country is 3 SDs away from the mean).

Maybe a page on AI capabilities and progress could be useful to explain to people why there are chances that very powerful AI appear this century m ? For instance one graph that I'd love to see would be when we expected a breakthrough and when it actually happened, things on the scaling of models and the scaling of performances, the evolution of the use of AI in industry, the evolution of investment in AI, the evolution of the distribution of funds or researchers between academia and private sector etc. I'm not sure that all of that is tractable but that would be great !

Data on trends in the influence of geopolitical "great powers," possibly going back to the beginning of civilization.

Ray Dalio has attempted to quantity and visualize similar: https://www.linkedin.com/pulse/big-cycles-over-last-500-years-ray-dalio

Collaborate or merge with New Things Under the Sun, a website that compiles social science research on innovation.

I would be especially interested in pieces on:

  • What types of innovation are out there, who is innovating (large companies, startups, universities, individuals, etc.), and ways to measure innovation
  • What we know about the effects of patent laws and other innovation policies on economic and social outcomes

I'm not sure if you're still actively monitoring this post, but the Wikipedia page on the Lead-crime hypothesis (https://en.wikipedia.org/wiki/Lead%E2%80%93crime_hypothesis) could badly use some infographics!! My favorite graph on the subject is this one (from https://news.sky.com/story/violent-crime-linked-to-levels-of-lead-in-air-10458451; I like it because it shows this isn't just localized to one area), but I'm pretty sure it's under copyright unfortunately.

Hi Ed,

Here are some imaginary fruit:

1. At the Happier Lives Institute we would be very interested to see something like a global burden of disease except for suffering. What are the largest sources of unhappiness across the world? 

2. OWID summarized the results of studies on important topics. That is, it collected and visualized meta-analytic information for important topics from databases like AidGrade or MetaPsy

Curated and popular this week
Paul Present
 ·  · 28m read
 · 
Note: I am not a malaria expert. This is my best-faith attempt at answering a question that was bothering me, but this field is a large and complex field, and I’ve almost certainly misunderstood something somewhere along the way. Summary While the world made incredible progress in reducing malaria cases from 2000 to 2015, the past 10 years have seen malaria cases stop declining and start rising. I investigated potential reasons behind this increase through reading the existing literature and looking at publicly available data, and I identified three key factors explaining the rise: 1. Population Growth: Africa's population has increased by approximately 75% since 2000. This alone explains most of the increase in absolute case numbers, while cases per capita have remained relatively flat since 2015. 2. Stagnant Funding: After rapid growth starting in 2000, funding for malaria prevention plateaued around 2010. 3. Insecticide Resistance: Mosquitoes have become increasingly resistant to the insecticides used in bednets over the past 20 years. This has made older models of bednets less effective, although they still have some effect. Newer models of bednets developed in response to insecticide resistance are more effective but still not widely deployed.  I very crudely estimate that without any of these factors, there would be 55% fewer malaria cases in the world than what we see today. I think all three of these factors are roughly equally important in explaining the difference.  Alternative explanations like removal of PFAS, climate change, or invasive mosquito species don't appear to be major contributors.  Overall this investigation made me more convinced that bednets are an effective global health intervention.  Introduction In 2015, malaria rates were down, and EAs were celebrating. Giving What We Can posted this incredible gif showing the decrease in malaria cases across Africa since 2000: Giving What We Can said that > The reduction in malaria has be
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f