Onni Aarne

Associate Researcher, AI Governance & Strategy @ Rethink Priorities
57 karmaJoined May 2019Working (0-5 years)Merihaka, 00530 Helsinki, Finland



Doing compute governance research at Rethink Priorities. Board member of EA Finland.


This was a great overview, thanks!

I guess I was left a bit confused as to what the goal and threat/trust model is here. I guess the goal is to be able to run models on devices controlled by untrusted users, without allowing the user direct access to the weights? Because if the user had access to the weights, they could take them as a starting point for fine tuning? (I guess this doesn't do anything to prevent misuse that can be achieved with the unmodified weights? You might want to make this clearer.)

As far as I can tell, the key problem with all of the methods you cover is that, at some point you have have to have the decrypted weights in the memory of an untrusted device. Additionally, you have to have the decryption keys there as well (even in the case of the data preprocessing solution, you have to "encrypt" new data on the device for inference, right?). By default the user should be able to just read them out of there? (The DeepLock paper gestures at the possibility of putting the keys in a TPM. I don't understand their scheduling solution or TPMs well enough to know if that's feasible, but I'm intuitively suspicious. Still, it doesn't solve the issue that the decrypted weights still need to be in memory at some point.) Given this, I don't really understand how any of these papers improve over the solution of "just encrypt the weights when not in use"? I feel like there must be something I'm missing here.

On the other hand, I think you could reasonably solve all of this just by running the model inside a "trusted execution environment" (TEE). The model would only be decrypted in the TEE, where it can't be accessed, even by the OS. For example, the H100 supports "confidential computing", which is supposed to enabled secure multiparty computation. And I think this problem can be thought of as a special case. The classic case of secure multiparty computation is data pooling: Multiple parties can collaborate to train a model on all of their data, without the different parties having access to each other's data. (See page 10 here) In this case, the model developer contributes the "data" of what the model weights are, and the user contributes the data on which inference is to be run, right?

But TEEs are only decently secure: If you're worried about genuinely sophisticated actors, e.g. nation states, you should probably not count on them.

Anyway, I'm looking forward to seeing your future work on this!

PS: Something related to consider may be model extraction attacks: If the user can just train an equivalent model by training against the "encrypted" model (and maybe leveraging some side channels), the encryption won't be very useful. I'm not sure how feasible this is in practice, but this is certainly a key consideration for whether this kind of "encryption" approach adds much value.

Yes, that's the narrowly utilitarian perspective (on the current margin). My point was that if you mix in even a little bit of common sense moral reasoning and/or moral uncertainty, causing x harm and preventing x harm is obviously more wrong than staying uninvolved. (To make this very obvious, imagine if someone beat their spouse but then donated to an anti-domestic abuse charity to offset this.) I guess I should have made it clearer that I wasn't objecting to the utilitarian logic of it. But even from a purely utilitarian perspective, this matters because it can make a real difference to the optics of the behavior.

Epistemic status: Some kind of fuzzy but important-feeling arguments

If one steps away from a very narrowly utilitarian perspective, I think the two are importantly disanalogous in a few ways, such that paying more attention to individual consumption of (factory farmed) animal products is justified.

The two are disanalogous from an offsetting perspective: Eating (factory farmed) animal products relatively directly results in an increase in animal suffering, and there is nothing that you can do to "undo" that suffering, even if you can "offset" it by donating to animal advocacy orgs. By contrast, if you cause some emissions and then pay for that amount of CO2 to be directly captured from the atmosphere, you've not harmed a single being. (Sure, there might be problems with real-world offsetting, but it's at least possible in principle.)

I also think the two cases are disanalogous in terms of moral seriousness (h/t MHarris for bringing up the term in another comment.)

Relatedly to the offsetting point, while factory farming is in and of itself morally wrong, there is nothing intrinsically wrong about emitting CO2. The harmfulness of those emissions is only a contingent fact that depends on other's emissions, the realities of the climate, and whether you later act to offset your emissions. Doing something that is much less obviously and necessarily wrong doesn't indicate moral unseriousness as much as something that is much more obviously and necessarily wrong.

Consuming factory farmed animal products also indicates moral unseriousness much more strongly because it is so extremely cheap to reduce animal suffering by making slightly different choices . Often the only cost is that you have to make a change to your habits, and the food might subjectively taste a tiny bit worse. Refusing to make that tiny sacrifice seems very difficult to justify

By contrast, reducing emissions often means completely forgoing goods and services like flights, or paying significantly more for a more climate friendly version of something. Not wanting to suffer those inconveniences, especially when they negatively affect one's ability to do good in other ways, is much less obviously a sign of moral unseriousness.

As a bonus bit of meta feedback: While skimming it was a bit hard for me to find the key cruxes / claims you were making, i.e. the post could have been structured a bit more clearly. Putting a good set of linked key takeways at the top could have solved much of this problem (and still could!).

Interesting post, thanks for writing it!

I'm not very familiar with the inner workings of think tanks, but I think you may be understating one aspect of the bad research consideration: If the incentives are sufficiently screwed up such that these organizations mostly aren't trying to produce good, objective research, then they're probably not doing a good job of teaching their staff to do that, nor selecting for staff that want to do that or are good at that. So you might not be able to get good research out of these institutions by just locally fixing the incentives.

But this depends a lot on how bad the situation is, and how important researcher judgment is for the project. It seems likely that folks at these institutions genuinely know a lot of facts about their topics of expertise, and for some projects that would be more important than e.g. overall judgment about what is important or likely, which seems more heavily affected by bad incentives. But at least the first three of your examples seem like the kind of projects where overall judgment is really important.

Maybe having these people on a advising or on a team with EAs or good forecasters might also help offset this?

Would be curious to hear thoughts on this from more people who've worked at these places. How bad are these problems in practice?

"Effective altruism" sounds more like a social movement and less like a research/policy project. The community has changed a lot over the past decade, from "a few nerds discussing philosophy on the internet" with a focus on individual action to larger and respected institutions focusing on large-scale policy change, but the name still feels reminiscent of the former.

It's not just that it has developed in that direction, it has developed in many directions. Could the solution then be to use different brands in different contexts? "Global priorities community" might work better than "Effective Altruism community" when doing research and policy advocacy, but as an organizer of a university group, I feel like "Effective Altruism" is quite good when trying to help (particularly smart and ambitious) individuals do good effectively. For example, I don't think a "Global priorities fellowship" sounds like something that is supposed to be directly useful for making more altruistic life choices.

Outreach efforts focused on donations and aimed at a wider audience could use yet another brand. In practice it seems like Giving What We Can and One for the World already play this role.