(Cleaned) EA Forum data for your interest and enjoyment

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

178

The first video from Giving What We Can's new channel is out now!

JustinPortela·5d ago·1m read

Hello! I'm Justin Portela. I got hired by GWWC to make YouTube videos after AI in Context did such a kickass job. My channel is using that same cinematic, high-production value beauty to talk about everything in the EA universe that isn't AI. ...

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·6d ago·2m read

This is a linkpost for Request for Proposals: Research and Applied Work on Digital Minds. I'm glad to announce a request for proposals for research and applied work on digital minds at Longview Ph...

Recent opportunities to take action

A huge way you can help pigs in 5-20 minutes (in the US)

ElliotTep·2d ago·1m read

175

Possible mistake EAs are making and shout out to Pause AI UK

Michelle_Hutchinson·2w ago·4m read

New round of digital minds funding opportunities at Longview

zdgroff, Longview Philanthropy·6d ago·2m read

Part 1: data

A few weeks ago, Jacques Thibs scraped the forum and put some data in the EA Twitter group for anyone to check out.

Long story short, I’ve cleaned the data a bit and figured some other users here might like to check it out! The data files are:

Original raw data (175 MB, .jsonl)

Original data, but as a .csv (~800 MB, don’t ask me why it’s larger)

My cleaned version (~90MG, .csv)

Note: some information is lost (like the text of each comment). For the full data, check out one of the above two data sets! Screenshot of what you’re getting:

4. Small sample version of #3 (.csv, 41kb, and you can open in Excel - the other ones you can’t, at least on my computer)

100 random posts and the first 100 characters of text in each

For all of these, each row or observation in one forum post, and each column or variable is some bit of information about that post, like its author, text, or date of publication.

I’m posting the cleaned version because the raw data .csv has ~21,000 variables, some of which have rather unfriendly names like

comments.comments.comments.comments.comments.date_published...66

(variable 626), or

comments.comments.comments.comments.comments.comments.comments.comments.comments.comments.comments.comments.comments.omega_karma...1031

(variable 20,112)

Call for proposals

Oh yeah, if there are any questions you think I should look into using this data, I'd love to to know! I'm half-decent at data cleaning and reshaping in R, econometrics in R and Stata, and data visualization, and not a whole lot else.

Thanks for reading and I look forward to whatever insights or visualizations others generate!

(Cleaned) EA Forum data for your interest and enjoyment

(Cleaned) EA Forum data for your interest and enjoyment

Note/disclaimer

Part 1: data

Part 2: charts

Monthly EA forum activity over time

January 2013-May 2022

Zoomed in to exclude the last ~2022 spike

Most prolific Forum authors

Showing full data

Zoomed in because Aaron Gertler is a God

Interactive chart on popular tags

Call for proposals