AI alignment and game theory researcher.
I would like to highlight an aspect you mention in the "other caveats": How much should you discount for Goodharting vs doing things for the right reasons? Or, relatedly, if you work on some relevant topic (say, Embedded Agency) without knowing that AI X-risk could be a thing, how much less useful will your work be? I am very uncertain about the size of this effect - maybe it is merely a 10% decrease in impact, but I wouldn't be too surprised if it decreased the amount of useful work by 98% either.
Personally, I view this as the main potential argument against the usefulness of academia. However, even if the effect is large, the implication is not that we should ignore academia. Rather, it would suggest that we can get huge gains by increasing the degree to which academics do the research because of the right reasons.
(Standard disclaimers apply: This can be done in various ways. Viktoria Krakovna's list of specification gaming examples is a good one. Screaming about how everybody is going to die tomorrow isn't :P.)
Of course, my views on this issue are by no means set in stone and still evolving. I’m happy to elaborate on my reasons for preferring this more modest usage if you are interested.
I think the more modest usage is reasonable choice.
Maybe you had a different country in mind. [regarding top-secret security clearance]
I am Czech. We do have the institute, and use it. But, as far as I know, our president doesn't have it, and a bunch of other people don't have it. (I.e., it seems that people who need secret information on a daily basis have it. But you don't need it for many other positions from which you could put pressure on people who have the clearance.)
Some thoughts that occured to me while reading:
1) Research suggestion: From afar, malevolence-detection techniques seem like a better version of the already-existing tool of top-secret security clearance (or tests similar to it). I am not confident about this, but it already seems that if top-secret security clearance was a requirement for holding important posts, a lot of grief would be avoided (at least where I am from). Yet we generally do not use this tool. Why is this? I suspect that whatever the answer is, it will apply to malevolence-detection techniques as well.
2) Potential bottleneck: Suppose you succeed and develop 100% accurate malevolence-detection technique. I think that, by default, you would have trouble convincing people to use it. ("I mean, what if I score high on it? You know, I am keeping my dark side in check and I don't plan to become too influential either, so my malevolence doesn't really hurt anybody. But the other people don't know that! If I get branded as malevolent, nobody will talk to me ever, or hire me, or anything!") I conjecture that the impact of this agenda will be bottlenecked on figuring out how to leave the malevolent people a line of retreat; making sure that if you score high on this, the implications aren't that bad. I see three reasons for this:
a) non-malevolent people might not know they are non-malevolent, and hence be afraid of this,
b) malevolent-and-know-it people might have enough power to hinder this,
c) reasonable general concerns about any test like this getting out of hand.
3) Relatedly to (2), would it make sense to consider some alternative branding that more accurately suggests what you intend to do with the concept (and doesn't suggest other things)? Unwieldly suggestion, to illustrate what I mean: Being publicly known as "potentially too risky to be in a position of great power" indicates that you shouldn't be a president, but you might still have friends, a spouse, and a prestigeous job. Being publicly known as "malevolent", however, ... . (Also, it seems plausible that there are people who are malevolent, but do not endorse being so, similarly to how, I think, there are paedophiles who wish they weren't so.)
(Also, it might not be obvious from my nitpicking, but I really like the post, thanks for it :-).)