Submit ideas of “interesting evaluations” in the comments. The best one by December 5th will get $50. All of them will be highly appreciated.
A few of us (myself, Nuño Sempere, and Ozzie Gooen), have been working recently to better understand how to implement meaningful evaluation systems for EA/rationalist research and projects. This is important both for short-term use (so we can better understand how valuable EA/rationalist research is) and for long-term use (as in, setting up scalable forecasting systems on qualitative parameters). In order to understand this problem, we've been investigating evaluations specific to research and evaluations in a much broader sense.
We expect work in this area to be useful for a wide variety of purposes. For instance, even if Certificates of Impact eventually get used as the primary mode of project evaluation, purchasers of certificates will need strategies to actually do the estimation.
Existing writing on “evaluations” seems to be fairly domain-specific (only focused on Education or Nonprofits), one-sided (yay evaluations or boo evaluations), or both. This often isn’t particularly useful when trying to understand the potential gains and dangers of setting up new evaluation systems.
I’m now investigating a neutral history of evaluations, with the goal of identifying trends in what aids or hinders an evaluation system in achieving its goals. The ideal output of this stage would be an absolutely comprehensive list that will be posted to LessWrong. While this is probably impractical, hopefully, we could make one comprehensive enough, especially with your help.
Suggest an interesting example (or examples) of an evaluation system. For these purposes, evaluation means "a systematic determination of a subject's merit, worth and significance, using criteria governed by a set of standards", but if you think of something that doesn't seem to fit, err on the side of inclusion
The prize is $50 for the top submission.
To enter, submit a comment suggesting an interesting example below, before the 5th of December. This post is both on LessWrong and the EA Forum, so comments on either count.
To hold true to the spirit of the project, we have a rubric evaluation system to score this competition. Entries will be evaluated using the following criteria:
- Usefulness/uniqueness of lesson from the example
- Novelty or surprise of the entry itself, for Elizabeth
- Novelty of the lessons learned from the entry, for Elizabeth.
Accepted Submission Types
I care about finding interesting things more than proper structure. Here are some types of entries that would be appreciated:
- A single example in one of the categories already mentioned
- Four paragraphs on an unusual exam and its interesting impacts
- A babbled list of 104 things that vaguely sound like evaluations
Examples of Interesting Evaluations
We have a full list here, but below is a subset to not anchor you too much. Don't worry about submitting duplicates: I’d rather risk a duplicate than miss an example.
- Chinese Imperial Examination
- Westminster Dog Show
- Turing Test
- Consumer Reports Product Evaluations
- Restaurant Health Grades
- Art or Jewelry Appraisal
- ESGs/Socially Responsible Investing Company Scores
- “Is this porn?”
- For purposes of posting on Facebook?
- Charity Cost-Effectiveness Evaluations
- Judged Sports (e.g. Gymnastics)
These are some of our previous related posts:
- Shallow Review of Consistency in Statement Evaluation
- Can we hold intellectuals to similar public standards as athletes?
- Prediction-Augmented Evaluation Systems
- Can We Place Trust in Post-AGI Forecasting Evaluations?
- ESC Process Notes: Claim Evaluation vs. Syntheses
- Predicting the Value of Small Altruistic Projects: A Proof of Concept Experiment
Huh! The thread I linked to and David Manheim's winning comment cite the same paper :)