Help improve reasoning evaluation in intelligence organisations

Luke Thorburn

Help improve reasoning evaluation in intelligence organisations

Luke Thorburn

4 min readMay 11, 2020

Comments

Sorted by

New & upvoted

No comments on this post yet.

Be the first to respond.

Comments

Curated and popular this week

What would an animal-aligned AI be aligned to?

Aidan Kankyoku, Anima International·2w ago·Curated 6d ago·15m read

This is a crosspost from the new Animal Welfare Alignment Newsletter by Anima International. You can subscribe on Substack if you are interested in following these efforts. Audio reading also available on Substack. The goals of this post are to: 1. Raise a question I see as crucially important to the goal of aligning AI to animal welfare...

133

Let's taboo the V-word

lincolnq·2d ago·8m read

“How long have you been v*g*n?” This is one of the most common icebreakers at animal protection events. It’s a baseline assumption, and it mostly holds true: if you’re out advocating for animals not to be tortured or abused, realistically these days you are v**n, or close. And it makes for good conversation. It seems fairly safe to assume when you meet strangers. But this assumption is hurting the movement in a way which we don’t always notice: someone new comes into the sp...

Spiro: an update 2.5 years on and a fundraising ask for expansion

Habiba Banu·4h ago·6m read

Summary Back in November 2023 I posted here to launch Spiro and raise our first $198k. Two and a half years later this is an update and a fundraiser for the next step. The short version: we've now reached over-5,900 people with TB preventive medicine, including over 3,000 children under five years old. Our early results have held up well an...

Recent opportunities to take action

Study Motivation

It is important that conclusions reached by analysts working in professional intelligence organisations are accurate so that resulting decisions made by governments and other decision-makers are grounded in reality. Historically, failures of intelligence have contributed to decisions or oversights that wasted resources and often caused significant harm. Prominent examples from US history include the attack on Pearl Harbour, the 1961 Bay of Pigs invasion, 9/11, and the Iraq War.

Such events are at least partly the result of institutional decisions made based on poor reasoning. To reduce the risk of such events, it is important that the analysis informing those decisions is well reasoned. We use the phrase well reasoned to mean that the arguments articulated establish the stated conclusion. (If the arguments fail to establish the stated conclusion, we say the analysis is poorly reasoned.)

The ‘industry standard’ method for evaluating quality of reasoning (QoR) amongst intelligence organisations in the US is the IC Rating Scale, a rubric based on a set of Analytic Standards issued by the US Office of the Director of National Intelligence (ODNI) in 2015. There are significant question marks over the extent to which the IC Rating Scale is (and can be) operationalised to improve the QoR in intelligence organisations. See here for a detailed summary, but in brief:

Inter-rater reliability of the Rating Scale between individual raters is poor. (Though reliability between aggregated ratings - constructed by averaging the ratings of multiple raters - is better.)

Information is lacking on whether or not the Rating Scale is valid (whether it in fact measures QoR, as intended).

Ambiguities in the specification of the Rating Scale can make it difficult for raters to apply.

The Rating Scale can be overly prescriptive and detailed, making it difficult to quickly distinguish well reasoned from poorly reasoned analytic products.

Our research group has been developing an alternative method for evaluating QoR, notionally called the Reasoning Stress Test (RST), which focuses on detecting the presence of particular types of reasoning flaws in written reasoning. The RST is designed to be an easy to apply and efficient method, but this approach comes at a cost: raters do not consider the degree to which the reasoning displays other reasoning virtues, nor go through a checklist of the necessary and sufficient conditions of good reasoning.

We are conducting a study to compare the ability of participants trained in each method to discriminate between well and poorly reasoned intelligence-style products (among other research questions).

We are offering training in both the current and novel methods for evaluating QoR in return for participation in the study. The training has been primarily designed for intelligence analysis, so will give you insight into how reasoning is evaluated in such institutions. However, the principles of reasoning quality taught are much more broadly applicable. They apply to all types of reasoning, and can be used to assess QoR in any institution with intelligence or analytical roles.

Methodological Note

We are aware that by publicly describing the potential limitations of the two methods—as we have done above—we risk prejudicing participants’ responses to either method in the study. The alternative, not to provide such information, would make it harder for you to decide whether the training is of interest. We decided to provide the information because:

we will be modelling the effect of existing familiarity with either method, rather than excluding participants on that basis;

in the context of our study design, it is difficult to articulate a plausible mechanism through which such prejudice could influence good faith participation in the study; and

at the current stage of research into methods for evaluating QoR, we believe that the value of additional data that may be gained by explaining the study motivation outweighs the potential limitations of that data as a result of this potential prejudicing effect.

Significant work has been done to develop polished, insightful training into both methods, and we are confident that learning the principles behind and application of both methods will help you evaluate the reasoning of others.

What does participation involve?

Participating in the study involves:

Random allocation to one of the two reasoning evaluation methods

Training on how to use the method, including some simple review questions

A series of challenging fictional intelligence products (i.e. reports or assessments) to evaluate. In previous testing, we have found that many of these are very difficult to evaluate.

Expert responses to each question to compare to your own.

After you have completed all the training on the first method, you will be given access to the training material for the other method. You can choose to complete the training in the second method or not as you prefer.

Help improve reasoning evaluation in intelligence organisations

Help improve reasoning evaluation in intelligence organisations

Study Motivation

Methodological Note

What does participation involve?

Sign Up Link

Related Reading