I scraped all public "Effective Altruists" Goodreads reading lists

by MaxRa4 min read23rd Mar 202113 comments

133

BooksCommunity
Frontpage

A couple of weeks ago I mentioned the idea of scraping the reading lists of the members of the Effective Altruists Goodreads group. The initial motivation was around the idea that EAs might be reading too much of the same books, and we might improve this by finding out which books are read relatively little compared to how many EAs proclaim that they want to read them. I got some positive feedback and got to work. Besides helping a little with improving our exploration of literature, I think the results also serve as an interesting survey of the reading behavior of EAs. Though we might want to keep in mind a possible selection bias for EAs and EA-adjacent people that share their reading behavior on Goodreads.

For those who don’t know Goodreads, it’s a social network where you can share ratings and reviews of the books you’ve read, and organize books in shelves like I have read this! or I want to read this!. It’s quite fun, many EAs are on there and I wholeheartedly recommend joining.

In total, there were 333 349 people in the Effective Altruists Goodreads group, and 257 275 of them had their privacy settings set to completely public, allowing anyone to inspect their reading lists even without being logged in. I checked the Goodreads scraping rules and was good to go.

Before you continue, I invite you to predict the following:

  • 3 from the 10 most read books, except Doing Good Better
  • a book that relatively many EAs want to read, but few have actually read

Finally, if you have any further ideas for analysis, leave a comment and I’ll be happy to see what I can do. If you want access to the csv file or the Python script I used, I uploaded them here. In this screenshot you see the types of data I have.

Most read books

Here the books that our community already explored a bunch. I would not have expected 1984 and Superintelligence to make it to the Top 5. HPMOR being the least read Harry Potter novel is a slight disappointment.

 

Most planned to read

Many classics on people’s I want to read this! lists, maybe overall slightly lengthier & more difficult books? Though Superforecasting is not too long and very readable and very excellent in my opinion, so feel free to read this one.

 

Highest planned to read / have read ratio

These are the books that might be more useful to be read by more EAs, as many say they want to read them, but in proportion the fewest people have actually read them. Of course, there are good reasons why some of those books are read less, e.g. some of them, like The Rise and Fall of American Growth, Probability Theory or The Feynman Lectures on Physics would take me enormously more time to read compared to, say, 1984 (which still took me, a relatively slow reader, something on the order of 10 to 20 hours). Also, the vast majority of the books in this list have only been read by one person, so a score of 11 can be interpreted as one person having read the book and 11 people wanting to read it. Additionally, as of now this list excludes books that have never been read by any EA, as the ratio would be infinite. For those books, see the next section.

 

If we only allow books with at least 2 reads, we get this list:

Most commonly planned to read books that have not been read by anyone yet

I’ll consider it a big success of this project if some people will have read Julia Galef's The Scout Mindset Energy and Civilization next time I check.

 

Highest rated books

Here the highest rated of all books that were read at least 10 times. Not too many surprises here, EAs know what's good!

Lowest rated books

Here the same with the highest rated books. Before any fandom feels too ostracized (speaking as somebody who absolutely loved the Eragon saga), I should inform you about how Goodreads suggests the rating system to be used:

1 star:        “didn’t like it”
2 stars:      “it was ok”
3 stars:     “liked it”
4 stars:    “really liked it”
5 stars:    “it was amazing”

 

Most fringy books that EAs read

Here the books that are read by more than 10 EAs that have the fewest reviews on Goodreads.

 

Final Thoughts

I worry these lists here might give a false impression of how diverse the reading of EAs really is. Just for a taste, in total there are 59,986 different books on EAs‘ reading lists (minus books that appear under multiple titles in Goodreads), and we read 26,753 of those already.

How to explore even better

The idea of reading more books that appear in high proportions in EAs to-read lists might improve the exploration of the literature a little bit, but it won’t help with exploring literature that hasn’t made it on the radar of at least some EAs. An additional approach could be to compare the reading lists of EAs with reading lists of other relevant communities and see where we might be able to explore ideas they find useful. I spontaneously can’t think of relevant communities that would be identifiable on Goodreads, unfortunately.

Another idea would be to follow or befriend a bunch of EAs on Goodreads and slightly lean towards reading books that none of them have read yet. For those books, it would be especially useful if you would leave some notes, either on the forum or in a Goodreads review.

Next steps for coordinating our literary exploration

  • Join Goodreads, and send me a friend request.
  • Let's coordinate the exploration! Leave a comment if one of the 'neglected' books appeals to you and you'd like to read it and maybe write up some notes about it. (H/T Gavin)

Acknowledgements

Thanks to Jasper Götting, Tilman Räuker and Gavin Leech for reading the draft and sharing ideas.

133

13 comments, sorted by Highlighting new comments since Today at 3:44 AM
New Comment

I’ll consider it a big success of this project if some people will have read Julia Galef's The Scout Mindset next time I check.

It's not out yet, so I expect you will get your wish if you check a bit after it's released :) 

Love this!

My recollection is that when you create a Goodreads account, you are asked whether you have read a few classics (like 1984, and possibly Thinking Fast Thinking Slow).

This can create a selection bias, as the user will say 'yes' even if they read the book years ago. They're less likely to go back, however, and add other books they read years ago.

Possible that this explains the 'most read' books?

Some of the books are also mandatory reading in American public school education, or mandatory reading in other countries/institutions.

This is brilliant!

I think we can actually do an explicit expected-utility and value-of-information calculation here:

  • Let one five-star book = one util 
  • Each book's quality can be modelled as a rate  of producing stars. 
  • The star rating you give a book is the sum of 5 Bernoulli trials with rate 
  • The book will produce  utils of value per read in expectation.
  • To estimate , sum up the total stars awarded  and total possible stars .
  • The probability distribution is then  (assuming uniform prior for simplicity).
  • For any pair of books, we can compute the probability that book 1 is more valuable than book 2 as .
  • Let's say there's a prescribed EA reading list. 
  • Let people who encounter the list be probabilistic automata.
  • These automata start at the top of the list, then iteratively either: 1) read the book they are currently looking at, 2) move down to the next item on the list, 3) quit.
  • Intuitively, I think this process will result in books being read geometrically less as you move down the list.
  • For simplicity, let's say the first book is guaranteed to be read, the next book has a 50% chance of being read, then 25%, ..., and then -th book has  chance of being read (with  starting at zero).
  • The expected value of the list is then 
  • To calculate the value of information for reading a given book, you enumerate all the possible outcomes (one-star, two-stars, ...., five-stars), calculate the probability of each one, look at how the rankings would change, and re-calculate the the expected value of the list. Multiply the expected values by the probabilities et voila

Can I get the data please?

Cool idea! Send you a message.

Just a readability suggestion: have the column for n (etc.) on the left and book on the right, so that we can read a left-aligned list of book names without the eyes having to jump around so much :)

I aligned it to the left, good point! :) Putting the n-column left would be even better, but only aligning the text left was already not trivial with the Pandas library.

It seems that dystopian novels are overrepresented relative to their share of the classics. I'm curious for others' thoughts why that is. I could imagine a case that they're more action-relevant than, e.g., Pride and Prejudice, but I also wonder if they might shape our expectations of the future in bad ways. (I say this as someone currently rereading 1984, which I adore...)

My quick reaction is that they're more ideas-focused. People interested in EA are selected for being interested in ideas.

This is really neat. I think in a better world analysis like this would be done by Goodreads and updated on a regular basis. Hopefully the new API changes won't make it more difficult to do this sort of work in the future.

I call shotgun on "On Certainty", one of the most-wanted books. (The author and I have butted heads before. He is much better at headbutting than me.)

Wow, I'm surprised I've read 28.5 of the 31 most read books list (0.5 because I still haven't finished The Precipice). I had no idea I've read that high of a fraction of the most popular books, especially since I haven't read that many books (170 marked read on Goodreads).

Small update: 

I added the reading lists of 18 people to the database, some of whom joined in the last week, some of whom for some reason didn't yield reading lists in the first runthrough. I think this didn't change much, except that one of those 18 people already read The Scout Mindset, and now there's another Eragon book in the Lowest Rated list... 

I also uploaded the code and csv file if anybody else wants to play around with it: https://github.com/MaxRae/EAGoodreads

Glad ya'll found this interesting! :) 

ETA: I you want to look at the first version for some reason, it's archived here.