Some background: I wrote this post to clarify my ideas on how I would run an EA job testing program and this is what I've concluded after 3 months of thinking (doesn't mean it's good, but whatever). I'm currently working on a program like this for AI safety with some help from Training For Good and if you have any feedback I would highly appreciate it.
Ladies and gentlemen, I have one question for you. Where is the bloody aptitude testing? For all of the talk of using the scientific method and Bayesian priors and everything, there's a considerable gap within career advising regarding actually testing what areas people show promise in. Therefore, a very natural extension to the current eight-week career advising course on 80k would be a job testing program. The job testing course wouldn't focus on skill testing for a particular job but on testing your fit for a specific area. This post is my research on what I think would be a good way to structure such a course and why we should do it in the first place.
Aptitude testing for specific skills required within work areas can be a great data point for people looking to find out what they want to do. According to this article from 80k, the r-value for work tests is 0.54, which is the highest out of all predictors listed from 80k and completely disregarding this data point is a bad thing when finding out if you're a good fit for a cause area.
If you're going to create a job testing program, you should make it iterative since there is a large solution space.
There is a tradeoff with having experienced facilitators, but having experienced people leading cohorts is probably needed to create a good program. To make it worthwhile in this case, it is worth having a filtering system to ensure a balance between quality and quantity in applicants. One has to do a careful analysis of the filtering system as the most promising people might not be the ones to benefit the most.
A more open structure with power in the hands of individual facilitators is preferred over a linearised system as it allows for more individualised feedback on specific participants' challenges and interests. It also ensures a higher rate of exploration in terms of tasks which is good since the solution space is large.
Previous courses and work
I have been in contact with training for good who have expressed interest in this idea, and they have something quite similar in their 5-week course on policy careers in Europe.
Large solution space
This problem seems to have ample solution space, meaning that there are many ways to solve this problem and that there might be many hills and valleys if one thinks of it as a landscape.
Therefore, creating a job test course lends itself to a more iterative approach where each week of the program should preferably be tried out and changed based on some faster type of feedback mechanism. We want a quicker feedback mechanism that allows us to do more minor updates and optimise faster.
The problem is that the feedback most likely has to come from career professionals. The people doing the job testing won't know if what they're doing accurately represents the underlying work. A confusion might occur because they won't properly work in that field for a couple of years and therefore cannot give feedback on the course. One can do several things to help increase the iteration speed towards an optimal solution, and we will look at them in the following section.
Guidance or no guidance
One of the more resource-consuming options is to have a guided course with a facilitator working within the course's area. There are several pros and cons of this approach, and the ones I could think of are listed below.
- It would ensure that the course is relevant to the current work in the area. (to the extent the facilitator is aware of the average task in the cause area)
- It would enable guided specialisation if someone wanted to do something more towards, for example, population ethics when looking to become a global priorities researcher.
- It would enable facilitators to pick up on talented people and to raise them faster.
- It would enable more social bonds to form between participants and also between participants and the facilitator.
- It would enable more easily accessible feedback on whether the course is going well or not.
- It would make the course less available to people as it would make it go in batches like EA virtual programs.
- It would require time from professionals within the areas that could have been spent on research instead.
- It would make the behaviour in the course less authentic as it also would be a test for how good someone is at specific skills.
- There is a higher startup cost to running the course, and it might be a lot harder to get off the ground due to that reason.
- It might become part of a career ladder if "promising" people are recruited to work in organisations based on this course.
Guidance with a filter
In my opinion, the pros of having a guided course seem to outweigh the cons, but it has a higher resource cost. To change this resource cost, how many people are required to run a good course, one could add a filter and prioritise testing for certain people. However, creating this filter does seem non-trivial as standard predictors such as how excited someone is might not predict how much value the course will add as the person in question might be going to the cause area no matter what. It would make more sense to have something like an urgency or uncertainty filter where the information gained from the course would be higher.
Following this, we will now look at AI safety as an example and extrapolate some problems to the general case. I also want to preface this part with that I'm not an AI safety researcher myself, just a budding one, so it's most likely only kind of accurate even though I've been talking to some ML researchers about this.
AI safety course as an example
Caveats for AI safety:
ML engineer vs research scientist
There are different roles within AI safety, meaning there might be special things to consider for different people. One of the things to consider is, for example, the difference between ML engineers and research scientists. These require different skill sets to do correctly and have different day to day tasks. This is an example of why guidance might be seen as generally required as different people have different aspirations and needs.
Options for a filter:
Forming a filter
Akin to how, for example, Charity Entrepreneurship operates with a form to match how close someone is to be an "optimal" candidate for it, one can use a similar, straightforward approach to assessing how good a fit people are.
In the AI safety case, it would also make sense to ask what experience people have in mathematics and computer science before by asking for either a GitHub page or asking what programming projects or research papers people have done before.
Options for week 1:
Expectations versus Reality
The idea here is to ask the participants to convey what they think the hourly distribution spent on doing different types of work are within the area and then ask them to rate their perceived enjoyment of these areas. You then show them what an actual day looks like and then ask them to update their views with a discussion on what they think they would like and dislike within the field. One could then continue this onto next week, where they're given tasks within the area to complete. In AI safety, this could be one of the following:
- If wanting to do ML research, ask them to write an ML program on a specific algorithm they haven't seen before and then do a follow-up
- If you want to become a research scientist, ask them to read three papers, generate a new idea based on the topics, and then write an outline on how to perform that idea.
Reading research papers and brainstorming new research ideas based on the papers.
An important skill is generating new ideas as the AI safety field needs creativity if we are to be able to solve the alignment problem. Reading research papers and brainstorming also encaptures people's proficiency in reading research papers and picking up on ideas within AI safety. The reading should be tailored for different ML knowledge levels and interests as different people will have differing levels of technical knowledge.
One could, for example, have three different areas of research such as:
- Interpretability research
- AI safety via debate
- Imitative amplification
Options for week 2:
Continuation of the trial runs
Essentially this boils down to repeating the same process, except you now review what people did last week and discuss where they would like to take their testing this week. This approach is also more flexible, meaning that new ideas on how to optimise this can spontaneously be generated, which might be necessary considering there's probably an ample solution space. I haven't talked to enough people to know what a good continuation of this would be yet, but it seems like one of the more promising iterative approaches.
Options for week 3:
Fill in the blanks
The third week might be a good time for testing all the things that the main tasks didn't get to. This might be communicating with a team and working in a team environment. It might also be adapting to lifestyle changes when changing to a different career or adapting to a new social circle. Many things follow with changing occupation, and all these things should be valued, tried and then iterated on to find what seems to be the most helpful tests for people.
Options for week 4:
Having a Q and A about key concerns
When people have practically tried out all of these things, there might still be many questions to ask, and it might be a good idea to end the last week with reflections and then a significant focus on a question and answer session. In the end, it might also be good to look into what people should do to test themselves in the future. Having a Q and A would also allow more flexibility, and as I hope you know by now, that is generally good here. (It might also turn bad if it is too unstructured since it asks for a lot from the facilitators.)
Determining the goal of the course
An interesting question is the goal of the course as it can be beneficial both to companies and to individuals to have a career testing course. Finding the right balance between these two things is vital as there can be value in scouting during the course duration. On the other hand, this might generate some signalling and some loss in the usefulness of the course for individuals. The solution is most likely in between, but my intuition tells me that it's more towards a purely individually based approach.
Testing for individuals
The job testing course could work as a 4-week fun-o-meter where you test whether you like the work you're likely to do. The benefit of this approach is that it eases the burden of taking part in the course. Participants' actions wouldn't be judged during the course, making it more appealing to participants and facilitators.
Testing for individuals and companies
Akin to the fun-o-meter, but for screening on who's a good fit for an area. The first problem with this approach is that job testing might not be good enough to predict work performance according to 80k as the r-value is 0.54. The low accuracy might lead to this being an unreliable data point for companies.
However, the major problem might be with Goodhart's Law, meaning that people might start doing it to signal their worth instead of using it as a measuring device. One solution might be doing it in secret, yet it is unclear what this would lead to and whether it is worth the risk is a difficult question.
On the other hand, it allows the course to be a lot better as a sieve for talent, which might make it the superior option.
Testing for companies
The course would focus on screening for work and optimising the sessions to get as much data about a person as possible. Now I'm not much for this, and I won't be writing about it as a serious alternative as there are many other traditional sources like regular job hunting that one could do this through.
I haven't talked to 80k at all while writing this post, and quite honestly, I have primarily been working in isolation, with some talking to friends and Training For Good. Feedback is much appreciated, especially if you have any better ideas to use in a job testing course. If you have any, feel free to share them, it is, after all, an ample solution space.