(Update: Not sure why I didn't just make two separate posts of this. The first part is about the Occlumency sorting algorithm, and the second part is about how to adjust for information cascades. They don't really relate to each other.)
32/50 of the highest karma posts on the forum are less than a year old. The top 3 posts are within a month old. Forum activity seems to have grown a lot lately, and there could be reasons to expect it to climb even higher.
Make no mistake, this could be really good news! Maybe more people have become interested in figuring out how to do good. I'm going to add it to my bag of reasons to be optimistic about the future. But it does mean we should reflect on what it could mean for us and how we should adapt to it.
1) It means that posts from this year are much more likely to have more karma compared to older posts of similar quality.
2) It also means that older posts are more valuable to promote on the forum since we have more fresh eyes here who haven't already read them.
3) And finally it means that it's slightly harder to find some of the historically best posts on the forum since sorting by karma will heavily bias it towards this year.
For these reasons, maybe someone should introduce another sorting category where instead of sorting by Magic (New & Upvoted), we can sort by Occlumency (Old & Upvoted).
Slice-of-pie weighting by monthly forum activity to approximate the conversion rate of readers->karma
Note: This was edited to give a clearer view of how the system could look, and take into account karma power inflation as mentioned by Arepo in the comments.
If two posts have the same number of readers, but the latter post has double the karma, that is some evidence that speaks to how likely it is to be usefwl to you. Let's call the efficiency with which a post converts readers to karma its conversion rate.
Unfortunately, directly weighting a post's karma by the reader count is potentially gameable. So you may wish to weight by a hard-to-game proxy for how many potential voters read it.
Assuming the data is available, one way to do this would be to break down a post's karma by month and weight the karma for each month based on the absolute value of the total karma assigned that month. What this could mean in practice is that each month has a fixed amount of karma that's allocated towards posts based on the proportion of monthly karma those posts received.
This would mean that Occlumency sorts posts by enduring value. The top-sorted posts will be the ones that have received the most and biggest slices of pie over time. If a post has been extremely popular for one month but hasn't seen much relevancy since then, it's going to be buried by posts that have been relevant for a longer time.
This also controls for karma inflation due to the accumulation of karma power for users with more karma. On Occlumency, users with more karma control a larger slice of the pie but don't add to the total amount of karma going around that month.
That being said, some bias in favor of more recent posts could be good if information and wisdom increased over time. And you want to be wary of accidentally promoting outdated posts. But if voter demographics have proportionately shifted towards people who haven't spent as much time thinking about the many nuances of having a positive impact, some bias towards older votes could be good. You could balance these considerations by adjusting the monthly pool of karma to have a non-linear positive correlation to the total unadjusted karma, but some experimental fine-tuning might be in order.
The expected amount of marginal karma a post will get within a slice of time depends on ) the expected number of potential voters reading it, and ) the expected number of karma they will give it given that they read it.
But further depends on the karma it already has, since 1) upvoted posts are more visible, and 2) we often read things because we want to stay "in the loop" and we use karma to infer what the loop actually is.
And could also depend on karma because we may subconsciously use popularity to partially infer quality and may insufficiently adjust downwards.
This has all the prerequisites for information cascades that depend on slight variations in initial conditions.
An information cascade occurs when people update on other people's beliefs, which may individually be a rational decision but may still result in a self-reinforcing community opinion that does not necessarily reflect reality.
Two posts published simultaneously may be of equal quality, but if the former starts out with 100 karma compared to the latter's 1, we are likely to see the former gain a lot more karma compared to the latter before the frontpage time window is over.
In summary, karma information cascades pose two problems:
- They amplify small variations in initial conditions. This means that small initial differences between posts may persist over time, and this increases the extent to which the top karma posts depend on luck to get there.
- They amplify the relative differences between posts. If all you know about two posts is that the former has five times the karma, then you probably shouldn't infer that its conversion rate (or quality) is five times higher.
Adjust for information cascades in real-time by hiding post authorship and karma first day of publication
You could imagine adjusting for information cascades by making karma follow a log curve, and doing this for all posts historically. This could ameliorate the second problem mentioned above, so maybe it's worth investigating and perhaps running simple computer simulations to decide on a fair curve. But I'm tentatively pessimistic about the value of that since it wouldn't change the ordinal ranking of posts, and that's what matters for visibility and sorting algorithms.
So to affect ordinal rankings, you need to deal with information cascades in real-time.
One way to do this could be to hide authorship and karma for 24 hours (or something) after the post has been published. Readers are now spread more evenly across day-one posts, which means that on day two, their relative karmas are more indicative of true conversion rates and less influenced by randomness and author popularity. So even if you let everything run as normal thenceforth, the information cascades would start on initial conditions that better reflect true conversion rates.
And importantly, this doesn't put new posts at a disadvantage compared to old posts. You're probably not changing the expected total karma over time by much. Instead, you're just changing how sensitive they are to randomness in initial conditions.
But, of course, there are trade-offs involved. To some of you, this might sound dumb but my intuition is that information cascades aren't that big of a problem in EA—for reasons. And delaying information about karma and authorship has a cost insofar as readers benefit from those as indicators of Value of Information.
To unequivocally recommend for or against it, I would want to do more investigation, but I think it's worth testing in order to get feedback on how it works in practice. To people who might wish to look into it, you might wish to find relevant insights in the field known as "veritistic social epistemology".
Assuming 1) people vote based on usefwlness-to-them, and 2) they're similar enough to you that usefwlness-to-them correlates strongly with usefwlness-to-you. Both are questionable assumptions, but you're already aware that karma only partially correlates with usefwlness-to-you.
FJehn had a somewhat related project last year, where they sorted based on a weighting of karma by the average karma for posts in the month of that post's publication.