Vojta Kovarik. AI alignment and game theory researcher.


Sorted by New

Wiki Contributions


What are examples of technologies which would be a big deal if they scaled but never ended up scaling?

In a somewhat similar vein, it would be great to have a centralized database for medical records, at least within each country. And we know how to do this technically. But it "somehow doesn't happen" (at least anywhere I know of).

A general pattern would be "things where somebody believes a problem is of a technical nature, works hard at it, and solves it, only to realize that the problem was of a social/political nature". (Relatedly, the solution might not catch on because the institution you are trying to improve serves a somewhat different purpose from what you believed, Elephant in the Brain style. EG, education being not just for improving thinking and knowledge but also for domestication and signalling.)

The academic contribution to AI safety seems large

I would like to highlight an aspect you mention in the "other caveats": How much should you discount for Goodharting vs doing things for the right reasons? Or, relatedly, if you work on some relevant topic (say, Embedded Agency) without knowing that AI X-risk could be a thing, how much less useful will your work be? I am very uncertain about the size of this effect - maybe it is merely a 10% decrease in impact, but I wouldn't be too surprised if it decreased the amount of useful work by 98% either.

Personally, I view this as the main potential argument against the usefulness of academia. However, even if the effect is large, the implication is not that we should ignore academia. Rather, it would suggest that we can get huge gains by increasing the degree to which academics do the research because of the right reasons.

(Standard disclaimers apply: This can be done in various ways. Viktoria Krakovna's list of specification gaming examples is a good one. Screaming about how everybody is going to die tomorrow isn't :P.)


Reducing long-term risks from malevolent actors
Of course, my views on this issue are by no means set in stone and still evolving. I’m happy to elaborate on my reasons for preferring this more modest usage if you are interested.

I think the more modest usage is reasonable choice.

Maybe you had a different country in mind. [regarding top-secret security clearance]

I am Czech. We do have the institute, and use it. But, as far as I know, our president doesn't have it, and a bunch of other people don't have it. (I.e., it seems that people who need secret information on a daily basis have it. But you don't need it for many other positions from which you could put pressure on people who have the clearance.)

Reducing long-term risks from malevolent actors

Some thoughts that occured to me while reading:

1) Research suggestion: From afar, malevolence-detection techniques seem like a better version of the already-existing tool of top-secret security clearance (or tests similar to it). I am not confident about this, but it already seems that if top-secret security clearance was a requirement for holding important posts, a lot of grief would be avoided (at least where I am from). Yet we generally do not use this tool. Why is this? I suspect that whatever the answer is, it will apply to malevolence-detection techniques as well.

2) Potential bottleneck: Suppose you succeed and develop 100% accurate malevolence-detection technique. I think that, by default, you would have trouble convincing people to use it. ("I mean, what if I score high on it? You know, I am keeping my dark side in check and I don't plan to become too influential either, so my malevolence doesn't really hurt anybody. But the other people don't know that! If I get branded as malevolent, nobody will talk to me ever, or hire me, or anything!") I conjecture that the impact of this agenda will be bottlenecked on figuring out how to leave the malevolent people a line of retreat; making sure that if you score high on this, the implications aren't that bad. I see three reasons for this:

a) non-malevolent people might not know they are non-malevolent, and hence be afraid of this,

b) malevolent-and-know-it people might have enough power to hinder this,

c) reasonable general concerns about any test like this getting out of hand.

3) Relatedly to (2), would it make sense to consider some alternative branding that more accurately suggests what you intend to do with the concept (and doesn't suggest other things)? Unwieldly suggestion, to illustrate what I mean: Being publicly known as "potentially too risky to be in a position of great power" indicates that you shouldn't be a president, but you might still have friends, a spouse, and a prestigeous job. Being publicly known as "malevolent", however, ... . (Also, it seems plausible that there are people who are malevolent, but do not endorse being so, similarly to how, I think, there are paedophiles who wish they weren't so.)

(Also, it might not be obvious from my nitpicking, but I really like the post, thanks for it :-).)