All of Onni Aarne's Comments + Replies

This was a great overview, thanks!

I guess I was left a bit confused as to what the goal and threat/trust model is here. I guess the goal is to be able to run models on devices controlled by untrusted users, without allowing the user direct access to the weights? Because if the user had access to the weights, they could take them as a starting point for fine tuning? (I guess this doesn't do anything to prevent misuse that can be achieved with the unmodified weights? You might want to make this clearer.)

As far as I can tell, the key problem with all of the m... (read more)

1
Madhav Malhotra
1y
Thank you for your thoughtful questions!  RE: "I guess the goal is to be able to run models on devices controlled by untrusted users, without allowing the user direct access to the weights?" You're correct in understanding that these techniques are useful for preventing models from being used in unintended ways where models are running on untrusted devices! However, I think of the goal a bit more broadly; the goal is to add another layer of defence behind a cybersecure API (or another trusted execution environment) to prevent a model from being stolen and used in unintended ways. These methods can be applied when model parameters are distributed on different devices (ex: on a self-driving car that downloads model parameters for low-latency inference time). But they can also be applied when a model is deployed on an API hosted on a trusted server (ex: to reduce the damage caused by a breach).  RE: "without allowing the user direct access to the weights? Because if the user had access to the weights, they could take them as a starting point for fine tuning?" The four papers I presented don't focus on allowing authorised parties to use AI models without accessing their weights. However, this is recommended by implementing secure APIs instead of directly distributing model parameters whenever possible in (Shevlane, 2022).  Instead, the papers I presented focused on preventing unauthorised parties from being able to use AI models that they illegitimately acquired. The content about fine-tuning was referring to tests to see if unauthorised parties could fine-tune stolen models back to original performance if they also stole some of the original data used to train the model. RE: "As far as I can tell, the key problem with all of the methods you cover is that, at some point you have have to have the decrypted weights in the memory of an untrusted device." and "The DeepLock paper gestures at the possibility of putting the keys in a TPM. I don't understand their schedu

Yes, that's the narrowly utilitarian perspective (on the current margin). My point was that if you mix in even a little bit of common sense moral reasoning and/or moral uncertainty, causing x harm and preventing x harm is obviously more wrong than staying uninvolved. (To make this very obvious, imagine if someone beat their spouse but then donated to an anti-domestic abuse charity to offset this.) I guess I should have made it clearer that I wasn't objecting to the utilitarian logic of it. But even from a purely utilitarian perspective, this matters because it can make a real difference to the optics of the behavior.

Epistemic status: Some kind of fuzzy but important-feeling arguments

If one steps away from a very narrowly utilitarian perspective, I think the two are importantly disanalogous in a few ways, such that paying more attention to individual consumption of (factory farmed) animal products is justified.

The two are disanalogous from an offsetting perspective: Eating (factory farmed) animal products relatively directly results in an increase in animal suffering, and there is nothing that you can do to "undo" that suffering, even if you can "offset" it by donating... (read more)

2
Robi Rahman
2y
This is only reasonable if you believe that causing x units of suffering and then preventing x units of suffering is worse than causing 0 suffering and allowing the other x units of suffering to continue. Actually, it's probably wrong even with that premise. Suppose Alice spends a day doing vegan advocacy, so that ten people who would have each eaten one hamburger don't eat them, but then she goes home and secretly eats ten hamburgers while no one is watching. Meanwhile, Brian leaves his air conditioner running while he's away from home, emitting 1 ton of CO2, then realizes his mistake, feels guilty, and buys 1 ton of carbon offsets. In either case, there's no more net harm than if both of them had done none of these actions, but according to your argument, Alice's behavior is worse than Brian's? Personally I consider these harms fungible and therefore Alice was net zero even if the hamburgers she ate came from a different cow than the ones the other people would've eaten.

Interesting post, thanks for writing it!

I'm not very familiar with the inner workings of think tanks, but I think you may be understating one aspect of the bad research consideration: If the incentives are sufficiently screwed up such that these organizations mostly aren't trying to produce good, objective research, then they're probably not doing a good job of teaching their staff to do that, nor selecting for staff that want to do that or are good at that. So you might not be able to get good research out of these institutions by just locally fixing the ... (read more)

6
Davidmanheim
2y
"How bad are these problems in practice?" At good think tanks, not very.

"Effective altruism" sounds more like a social movement and less like a research/policy project. The community has changed a lot over the past decade, from "a few nerds discussing philosophy on the internet" with a focus on individual action to larger and respected institutions focusing on large-scale policy change, but the name still feels reminiscent of the former.

It's not just that it has developed in that direction, it has developed in many directions. Could the solution then be to use different brands in different contexts? "Global priorities comm... (read more)

1
Jonas V
3y
I think it might actually be pretty good if EA groups called themselves Global Priorities groups, as this shifts the implicit focus from questions like "how do we best collect donations for charity?" to questions like "how can I contribute to [whichever cause you care about] in a systematic way over the course of a lifetime?", and I think the latter question is >10x more impactful to think about. (I generally agree if there are different brands for different groups, and I think it's great that e.g. Giving What We Can has such an altruism-oriented name. I'm unconvinced that we should have multiple labels for the community itself.)