Ben Garfinkel: The future of surveillance

EA Global

This is a linkpost for https://www.youtube.com/watch?v=w4g5mCy5yr8&list=PLwp9xeoX5p8P3cDQwlyN7qsFhC9Ms4L5W&index=10

Too much surveillance can lead to privacy violations and creeping authoritarianism, but too little can lead to catastrophe. In general, debates about surveillance involve taking a position on the tradeoff between privacy and security. But what if we could have it both ways? In this talk from Effective Altruism Global 2018: San Francisco, Ben Garfinkel explores ways we may be able to increase the benefits of surveillance in the future, without also increasing the drawbacks.

A transcript of Ben's talk is below, which we have lightly edited for clarity. You can also watch it on YouTube and read it on effectivealtruism.org.

The Talk

Introduction

I am going to be making the case that surveillance is an area that effective altruists don't tend to think that much about, but it's potentially an area that we should be thinking a lot more about, especially if we care about the long-run future. The way I'm going to make this case is that first, I'm going to describe two ways in which the future of surveillance can be quite bad. Then I'm going to describe a more positive future, and argue that this isn't something that appears very frequently in discussions of surveillance, but something that plausibly more people in effective altruism ought to be thinking about.

So, first, here are a couple of outcomes we don't want for the future of surveillance. On the left, we have a depiction of what seems to be perhaps a bit of a violation of privacy or not as much accountability as you'd want for your system of surveillance. On the right we seem to have, perhaps some sort of oversight. It seems like a significant security issue is unfolding that perhaps we wanted a little bit more intense surveillance to prevent.

Two failure modes for surveillance

To describe the first scenario in less of a caricature, the concern here is that currently, governments are collecting a very large amount of data about individuals all over the world. This seems to be increasing over time. Everything we do online mostly ends up collected, people walking around with things in their pockets which have microphones and cameras and GPS locators. Surveillance cameras are becoming more prevalent, too. It just seems like in general, we should expect the amount of data collected on individuals to keep going up over time. Probably more importantly, the ability to use this data is also going up as well. Partly this is a matter of better analysis, better data mining to identify individuals from mass collected data, things of that nature.

Partly, also, it's a matter of being able to more efficiently use the information which is gleaned from data mining. Just one quick example, which is fairly benign but somewhat suggestive: in recent years it's become somewhat common, in some provinces in China, to used facial recognition cameras to do things like automatically recognize people who are jaywalking and automatically fine them. A bit of a slightly longer-term thing is the idea of a social credit score. This is the idea of using large amounts of data collected about people, including, for example, their social media postings or the crimes they commit, and using this to assign people a score which will affect their employment prospects, their ability to travel or get into certain schools. And although none of this stuff is yet very significant, there's some suggestion that in the long-run future, or if you let this go for just a few more decades, we may see much more strong versions of social incentive shaping and much more invasive forms of surveillance. In the long run you might be more concerned about countries being more authoritarian or just political institutions we care about working less well.

The other category of risk that you might be concerned about is the ineffectual surveillance scenario. The argument for this concern is that, in the future, it may be the case that methods of causing large amounts of harm become a lot cheaper or easier to use. Right now, if you want to hurt a lot of people, it's not that easy to do that; it's difficult to to hurt more than a few hundred people. But this is, to some extent, probably a matter of what technology is available. Some people have suggested that, for instance, synthetic biology, given perhaps a few decades, may make it easy for relatively small groups of people to design pathogens that can harm very large numbers.

Other technologies which are sometimes discussed with this sort of narrative are cheap drone swarms, in the longer-term future nano-weapons or especially disruptive cyber weapons. It's not necessarily clear that any of these individual technologies is extremely likely to have this property of making it very cheap to cause large amounts of harm. But we can draw an analogy to suggest what the significance of these technologies would be. So suppose that it turned out to be the case that nuclear weapons were much cheaper to make than they in fact are, say that rather than requiring massive state programs and years and years of work, that anyone could fairly easily construct nuclear weapons from household materials. It seems like in a world of that sort, the odds that they wouldn't be used would be very, very low, and you'd likely need some very pervasive form of surveillance to actually catch people who were planning to cause this large amount of harm. We don't know that any future technology will have these properties, but it seems not entirely impossible that one might. And if that's the case then we'll want, probably, much more effective forms of surveillance.

The trade-off narrative

Something which is typically a part of discussions of surveillance is a trade off narrative. So on the one hand, there's this idea that the more you protect people's privacy, the less you allow governments to actually protect people's security. And on the other hand, there's this idea that, the more you make the government accountable, and sort of let people know what they're up to, the less effectively governments will be able to operate. To explain or justify the privacy/security trade off, let's take the case of someone who's carrying a bag that may or may not have a bomb in it, and consider a police officer who'd like to know if it has a bomb, who doesn't have any tools available to them.

It seems like the officer's two options are to, first of all, they can potentially open up the bag and look at what's inside, and see if there's a bomb. But in the process they'll figure out everything else that's in the bag, and some of this might be quite personally revealing. On the other hand, they can choose not to open the bag, and therefore not violate the person's privacy, but in choosing not to open the bag, they also don't learn whether or not there's a bomb inside.

The accountability/security trade off, the idea here is that, let's say take the case of a protocol which is used to select people for search or a special scrutiny. We may want to know that the protocol is actually being followed, that an individual isn't deviating from it. We may also want to know what exactly the details of the protocol are. Is it something which is discriminatory? Is it something which is fair? Does it have a sufficiently high accuracy rate? But a case which is often made by governments to keep their protocols secret is to argue that, if you make the details of a protocol public, then people can figure out how to get around it and it becomes much less effective.

And so this trade off narrative seems to suggest that steering away from one risk means steering towards the other. Even if you have the mindset that only one of these two risks is actually credible or of significant importance other than the fact that other people care about the other risks, it means that there will be political constraints on pursuing solutions to one risk or the other. As a sort of an extreme caricature, let's say that you're someone who doesn't care about privacy at all, you think that the risk from authoritarianism isn't in any way important, and you think would be really great if the government put cameras in every single person's home and watched them all the time. The fact that other people are definitely not cool with that means that your proposed solution would be a nonstarter. So in general, it seems like the more severe the trade off is between these values, the more concerned we should be about either risk or both risks together.

This all seems to suggest that a useful thing to do would be to look for opportunities to reduce these two trade offs. This means looking for ways to make surveillance more accountable and privacy preserving. While this sounds a bit idealistic, we can get some intuition that it is possible by looking at different forms of surveillance which are applied today, and that definitely vary quite significantly in how much they protect people's privacy. So if we were trying to get into the case of a bag that may or may not have a bomb in it, suppose that instead of just opening the bag, a police officer has access to a bomb sniffing dog. In this case, they can have the dog come up to the bag and sniff it. If the dog barks, the officer opens the bag and search. If the dog doesn't bark, the officer doesn't.

In the idealized case with a dog that has a perfect accuracy rate, they only learn exactly what's relevant for security: Does a person have a bomb? But they don't violate people's privacy in any other way. And the more accurate it is, the less it violates people's privacy. At the same time, this is also a fairly accountable form of surveillance. If you have your bag on you, you can tell that a dog was used rather than just a person rifling through it. You can also tell whether the dog barked. It's difficult to lie that a dog has barked when it hasn't, because you can hear it.

So, that's a sort of a specific case. A more sort of abstract case for optimism is that in the future, it's likely that surveillance and law enforcement will become more heavily automated. While this has a number of scary components to it, there's also some reasons to think that this trend may actually make it easier to ensure privacy and accountability. So here's some basic advantages of automation: first of all, if you automate an analysis task that would ordinarily be performed by a human, then you can use software as a screen between data and the humans who see it. So the analogy is to a sniffing dog. Again, if you have some piece of software which looks at data and makes some initial judgment of whether to search further, then potentially a human doesn't need to look at data that they would otherwise look at. You can also potentially redact sensitive information automatically so that no human ever needs to see it. One concrete example would be automatic face blurring of faces that appear in police body camera footage.

In certain regards, algorithms are also more predictable and less opaque than humans. AI is often a black box, but it's less of a black box than the human brain is. You can't really look at a human police officer's brain to see what's going on there. But you can often look at the source code for software. It's also often easier to associate things are done with software with reliable audit logs, as opposed to let's say, trusting human analysts to record what they're up to. Software's also less likely to engage in certain abuses that a human might. So one slightly disturbing example of this is, there's this concept, at least in the past, hopefully not in the present, within the NSA of LoveInt, short for love intelligence. The idea is looking at information on a significant other or an ex, and this was apparently common enough that they had jargon for it within the NSA. Seems like something weird has happened if software that you've designed is doing that.

At the same time, if you're using software in place of humans, it also potentially becomes easier and more efficient to automate a single piece of software, as opposed to auditing lots of different humans who might be replicating this behavior. So if you're using a single piece of software in lots of different cases, then potentially you just look at the single piece as opposed to lots of different human analysts and officers and officials, who might be deviating from protocols. It's also easier to associate a piece of software which is applied in lots of different cases with summary statistics, for example, in accuracy rate, compared to using statistics for humans. I think this is actually a large issue currently with basically surveillance in law enforcement at the moment, is that it seems like intuitively if you're establishing probable cause or reasonable suspicion, there's sort of a probability threshold for that. Exactly how accurate is, for example, an officer's judgment that someone meets these thresholds? What portion of the time are they right?

For an individual officer, this isn't extremely feasible to collect these statistics, but, for example, using a facial recognition system, you actually have fairly good data on exactly how accurate it is. You can actually have a fairly informed discussion about what sort of false positive rate is too high. That's a bit hard to have for humans.

A couple of less obvious advantages are that increasing automation can actually decrease the need for data collection, and that increasing automation can decrease the disruptiveness of engaging in auditing. So the idea for surveillance without data collection, the basic idea is certain cryptographic technologies make it possible to analyze data or extract certain pieces of information from it without collecting the data in unencrypted form.

Probably the most notable technology for this is called secure multiparty computation, which is extremely general. And just in the past decade or so, it became much more practical to use. A couple of examples of how this technology can be used: so the first is the idea of a set intersection search. This comes up fairly commonly in law enforcement contexts, where you want to identify someone that's suspicious on the basis of the fact that they show up in a few different databases.

A concrete example is, say someone robs a few different banks. Police know it's the same person, but they don't know who it is. They might want to search the cell records for the cell towers that were near all three of the banks to see if anyone made calls near all three of them. The way you would traditionally do this, in a way that actually historically has been done, is to collect all of the records from these three cell towers, get tens of thousands of people's records, and then comb through them and see if any name pops up three times.

But it's actually possible to do this without mass collecting records at all. So there's a paper in 2014, that shows how to conduct a search of this sort, where you get out a list of names that appear in all three of these databases, but you don't get any other information besides just those names. So rather than collecting tens of thousands of people's information, you collect maybe one or two. Another example is fraud detection. So for the case of value added fraud detection, one way you sometimes do this is by finding discrepancies between different companies' private financial records. One, let's say, reports a purchase, that doesn't match another country's report of a sale. There's a paper in 2015 that describes a protocol that I believe has actually now been used by the Estonian government to find cases of tax fraud of this sort, without collecting companies' private financial records. They get this output of here are the discrepancies, but they don't actually get any unencrypted records, so they can't learn anything else other than just who is showing discrepancies.

Another example, which I'm also not going to get into the technical details of, is this idea, traceable to a paper by Joshua Kroll in 2016, to use a cryptographic technology called zero knowledge proofs to produce accountable algorithms. The basic upshot is that he shows that it's possible in many cases to prove to the public that a protocol that's received their approval is still being applied. You can prove that people aren't straying from the protocol, and also that the protocol has certain desirable formal properties, for example, fairness properties, without actually making the protocol's details public. So that's desirable, if the reason for not making the details of a protocol public are that you can say, oh, if we made them public, people could get around it, or potentially you're a law enforcement agency and you can't make the details public because some private company has developed it for you and it's quite profitable. So this is a way of making things more accountable while dodging those objections that making yourself more accountable would make you less effective.

Conclusions

So in my fairly non-expert view, I see a few opportunities for things effective altruists could add to the conversation around surveillance. One is that in my opinion, the conversation is typically too focused on managing trade offs. For example, debating exactly how much security you can get by trading away this amount of privacy, or sometimes denying that these trade offs exist to any extent, which seems implausible. Rather, I think it'd be more useful to look ahead to technical solutions for actually reducing these trade offs. Another concern I have is that a lot of the conversation concerns current programs, or programs which are just getting off the ground. I think it would also be productive to have a conversation about what forms of surveillance we might want to be moving toward, let's say over a 10 or 20 year period, as new risks emerge, and also as new technologies make different forms of surveillance feasible.

And then the last one is that I think discussions of surveillance are ofte reliant on assumptions about technology which aren't actually true or are going to become less true in the future. So a classic one is that analyzing data actually requires collecting data, which isn't actually technically true.

One last comment is that this presentation has been all about mass surveillance, but a lot of what I've said also applies to the case of agreement verification in an international relations context. So, the idea here is that frequently you want to verify compliance with an agreement, let's say an arms agreement, but often the process of monitoring the country or verifying compliance involves collecting lots of sensitive or private information.

So for example, arms agreement auditing might give away details of weapons systems or allow countries access to private actors' labs that might contain valuable intellectual property. And if you can find ways to make more privacy preserving forms of monitoring in the same way, you can make more privacy preserving forms of surveillance, and this could potentially reduce the bottleneck on the ability to actually reach international agreements. This is also something that seems to intersect a lot with existing effective altruist concerns around global catastrophic risks and governance of emerging technologies.

To close, in the future, surveillance might threaten the institutions that we care about, or it might fail to protect us from new threats. It seems like there's some sort of trade off between addressing these risks, but these trade offs also don't seem to be immutable. There is some hope that in the future, technological progress can help reduce these for us. So therefore it seems like the project of pursuing accountablem, privacy-preserving surveillance, while not something that many people are engaging in at the moment, might be something that more effective altruists want to look into or signal boost in conversations around surveillance.

Effective Altruism Forum
EA Forum