A full transcript of this talk exists at https://forum.effectivealtruism.org/posts/LdZcit8zX89rofZf3/evidence-cluelessness-and-the-long-term-hilary-greaves
If anyone were able to find pdfs for all of the papers and share the links here, that would be much appreciated
I wasn’t aware it was first published in your blog. Thanks for nudging Prof Shelly Kagan to share their syllabus!
Is there a useful way to financially incentivise these sort of independent evaluations? Seems like a potential good use of fund money
Done! Thanks for working on this! Do the other links still work fine?
I've set up a system for buying books for people on request. If people are interested in using it you can read more and express interest here: eabooksdirect.super.site
I track my time using hourstack.com and try to be quite strict with only tracking 'sit down work time'. I normally can do around 3.5-4h of work a day. I normally start at 10am and finish around 5pm.
This matches my experience at college, where I found I could normally do around 4 hours of studying before feeling tired out.
It's easier for me to 'clock more hours' when I have more meetings. But I try to avoid meetings.
I find that I can get most of my things done within this time and would consider myself a quite productive person.
Thanks for explaining your view! I don’t really have super strong views here, so don’t want to labour the point, but just thought I’d share my intuition for where I’m coming from. For me it makes sense to have a thresholds at the places because it does actually carve up the buckets of reactions better than the linear scale suggests.
For example, some people feel weird rating something really low and so they “express dislike” by rating it 6/10. So to me the lowest scorers and the 6/10ers are actually probably have more similar experiences than their linear score suggests. I claim this is driven by weird habits/something psychological of how people are used to rating things.
I think there’s a similar thing at the 7/8/9 distinction. I think when people think something is “okay” they just rate it 7/10. But when someone is actually impressed by something they rate it 9/10, which is only 2 points more but actually captures a quite different sentiment. From experience also I’ve noticed some people use 9/10 in place of 10/10 because they just never give anything 10/10 (e.g they understand what it means for something to be 10/10 differently to others)
The short of it is that I claim people don’t seem to use the linear scale as an actual linear scale , and so it makes sense to normalise things with the thresholds, and I claim that the thresholds are at the right place mostly just from my (very limited) experience
Thanks! I guess I think NPS is useful precisely because of those threshold effects, but agree not sure that it handles the discrimination between 6 and 1 well. Histograms seem great!
Would you be able to provide a Net Promoter Score analysis of your Likelihood to Recommend metrics? I find NPS yields different, interesting information from an averaged LTR and should be very straightforward to compute.