Vojta Kovarik. AI alignment and game theory researcher.


Sorted by New

Topic Contributions


Meditations on careers in AI Safety

Some thoughts:
 1) Most importantly: In your planning, I would explicitly include the variable of how happy you are. In particular, if the AI Safety option would result in a break-up of a long-term & happy relationship, or cause you to be otherwise miserable, it is totally legitimate to not do the AI Safety option. Even if it was higher "direct" impact. (If you need an impact-motivated excuse - which might even be true - then think about the indirect impact of avoiding signalling "we only want people who are so hardcore that they will be miserable just to do this job".)

2) My guess: Given that you think your QC work is unlikely to be relevant to AI Safety, I personally believe that (ignoring the effect on you), the AI Safety job is higher impact.

3) Why is it hard to hire world experts to work on this? (Some thoughts, possibly overlapping with what other people wrote.)

  • "world experts in AI/ML" are - kinda tautologically - experts in AI/ML, not in AI Safety. (EG, "even" you and me have more "AI Safety" expertise than most AI/ML experts.)
  • Most problems around AI Safety seem vague, and thus hard to delegate to people who don't have their own models of the topic. Such models take time to develop. So these people might not be productive for a year (or two? or more? I am not sure) even if they are genuine about AI Safety work.
  • Top people might be more motivated by prestige than money. (And being "bought off" seems bad from this point of view, I guess.)
  • Top people might be more motivated by personal beliefs than money. (So the bottleneck is convincing them, not money.)

4) I am tempted to say that all the people who could be effectively bought with money are already being bought with money, so you donating doesn't help here. But I think a more careful phrasing is "recruiting existing experts is bottlenecked on other things than money (including people coming up with good recruiting strategies)".

5) Phrased differently: In our quest for developing the AI Safety field, there is basically no tradeoff between "hiring 'more junior' people (like you)" and "recruiting senior people", even if those more junior people would go earning to give otherwise.

What are effective ways to help Ukrainians right now?

Two considerations seem very relevant here:
(1) Is your primary goal to help Ukranians, or to make this more costly for Russia?
(2) Do you think the extra money is likely to change the outcome of the war, or merely the duration?

Is it crunch time yet? If so, who can help?

When considering self-sacrifice, it is also important to weigh-in the effects on other people. IE, every person that "sacrifices something for the cause" increases the perception that "if you want to work on this, you need to give up stuff". This might in turn turn people off from joining the cause in the first place. So even if the sacrifice increases the productivity of that one person, the total effect might still be negative.

Is it crunch time yet? If so, who can help?

My answer to the detailed version of the question is "unsure...probably no?": I would be extremely wary of reputation effects and perception of AI safety as a field. As a result, getting as many people as we can to work on this might prove to not be the right approach.

For one, getting AI to be safe is not only a technical problem --- apart from figuring out how to make AI safe, we need to also get whoever builds it to adopt our solution. Second, classical academia might prove important for safety efforts. If we are being realistic, we need to admit that the prestige associated with a field has impact on which people get involved with it. Thus, there will be a point where the costs of bringing more people in on the problem might outweight the benefits.

Note that I am not saying anything like "anybody without an Ivy-league degree should just forget about AI safety". Just that there are both costs and benefits associated with working on this, and everybody should consider these before doing major decisions (and in particular outreach).

What are examples of technologies which would be a big deal if they scaled but never ended up scaling?

In a somewhat similar vein, it would be great to have a centralized database for medical records, at least within each country. And we know how to do this technically. But it "somehow doesn't happen" (at least anywhere I know of).

A general pattern would be "things where somebody believes a problem is of a technical nature, works hard at it, and solves it, only to realize that the problem was of a social/political nature". (Relatedly, the solution might not catch on because the institution you are trying to improve serves a somewhat different purpose from what you believed, Elephant in the Brain style. EG, education being not just for improving thinking and knowledge but also for domestication and signalling.)

The academic contribution to AI safety seems large

I would like to highlight an aspect you mention in the "other caveats": How much should you discount for Goodharting vs doing things for the right reasons? Or, relatedly, if you work on some relevant topic (say, Embedded Agency) without knowing that AI X-risk could be a thing, how much less useful will your work be? I am very uncertain about the size of this effect - maybe it is merely a 10% decrease in impact, but I wouldn't be too surprised if it decreased the amount of useful work by 98% either.

Personally, I view this as the main potential argument against the usefulness of academia. However, even if the effect is large, the implication is not that we should ignore academia. Rather, it would suggest that we can get huge gains by increasing the degree to which academics do the research because of the right reasons.

(Standard disclaimers apply: This can be done in various ways. Viktoria Krakovna's list of specification gaming examples is a good one. Screaming about how everybody is going to die tomorrow isn't :P.)


Reducing long-term risks from malevolent actors
Of course, my views on this issue are by no means set in stone and still evolving. I’m happy to elaborate on my reasons for preferring this more modest usage if you are interested.

I think the more modest usage is reasonable choice.

Maybe you had a different country in mind. [regarding top-secret security clearance]

I am Czech. We do have the institute, and use it. But, as far as I know, our president doesn't have it, and a bunch of other people don't have it. (I.e., it seems that people who need secret information on a daily basis have it. But you don't need it for many other positions from which you could put pressure on people who have the clearance.)

Reducing long-term risks from malevolent actors

Some thoughts that occured to me while reading:

1) Research suggestion: From afar, malevolence-detection techniques seem like a better version of the already-existing tool of top-secret security clearance (or tests similar to it). I am not confident about this, but it already seems that if top-secret security clearance was a requirement for holding important posts, a lot of grief would be avoided (at least where I am from). Yet we generally do not use this tool. Why is this? I suspect that whatever the answer is, it will apply to malevolence-detection techniques as well.

2) Potential bottleneck: Suppose you succeed and develop 100% accurate malevolence-detection technique. I think that, by default, you would have trouble convincing people to use it. ("I mean, what if I score high on it? You know, I am keeping my dark side in check and I don't plan to become too influential either, so my malevolence doesn't really hurt anybody. But the other people don't know that! If I get branded as malevolent, nobody will talk to me ever, or hire me, or anything!") I conjecture that the impact of this agenda will be bottlenecked on figuring out how to leave the malevolent people a line of retreat; making sure that if you score high on this, the implications aren't that bad. I see three reasons for this:

a) non-malevolent people might not know they are non-malevolent, and hence be afraid of this,

b) malevolent-and-know-it people might have enough power to hinder this,

c) reasonable general concerns about any test like this getting out of hand.

3) Relatedly to (2), would it make sense to consider some alternative branding that more accurately suggests what you intend to do with the concept (and doesn't suggest other things)? Unwieldly suggestion, to illustrate what I mean: Being publicly known as "potentially too risky to be in a position of great power" indicates that you shouldn't be a president, but you might still have friends, a spouse, and a prestigeous job. Being publicly known as "malevolent", however, ... . (Also, it seems plausible that there are people who are malevolent, but do not endorse being so, similarly to how, I think, there are paedophiles who wish they weren't so.)

(Also, it might not be obvious from my nitpicking, but I really like the post, thanks for it :-).)