Hide table of contents

Enlightenment Values in a Vulnerable World


The Vulnerable World Hypothesis: If technological development continues then a set of capabilities will at some point be attained that make the devastation of civilization extremely likely, unless civilization sufficiently exits the semianarchic default condition.

The Vulnerable World Hypothesis (VWH) is an influential 2019 paper by philosopher Nick Bostrom. It begins with a metaphor for technological progress: An urn full of balls representing technologies of varying degrees of danger and reward. A white ball is a technology which powerfully increases human welfare, while a black ball is one which “by default destroys the civilization that invents it.” Bostrom stipulates that “the term ‘civilizational devastation’ in the VWH refers to any destructive event that is at least as bad as the death of 15 percent of the world population or a reduction of global GDP by > 50 percent lasting for more than a decade.” Given the dire consequences of such a technology, Bostrom argues for enlarged state capacity, especially in terms of global reach and surveillance, to prevent the devastating technology from being invented. 

The VWH is a wet blanket thrown over Enlightenment values; values which are popular with many EAs and among thinkers associated with progress studies such as David Deutsch, Steven Pinker, and Tyler Cowen. These Enlightenment values can be summarized as: political liberty, technological progress, and political liberty ⇒ technological progress. Even if technology has a highly positive expected value on human welfare this can be easily outweighed by a small chance of catastrophic or existential risk. The value of political liberty is often tied to its promotion of technological progress. Large risks from technological progress would therefore confer large risks on political liberty. Bostrom highlights this connection but goes further. Not only is political liberty dangerous because of its facilitation of catastrophic technological risk, but strict political control is good (or at least better than you thought it was before) because it is necessary to prevent these risks. In response to a black ball technology Bostrom says that “It would be unacceptable if even a single state fails to put in place the machinery necessary for continuous surveillance and control of its citizens.” If Bostrom is right that even a small credence in the VWH requires continuously controlling and surveilling everyone on earth, then Enlightenment values should be rejected in the face of existential risk.

We do not know whether the VWH is true, and it is undecidable via statistical analysis until we draw a black ball or empty the urn. Thus, I consider the implications of the VWH for Enlightenment values both when it is false and when it is true. . If it is false then traditional arguments for Enlightenment values become even stronger. If the VWH is true I find that one can still reasonably believe that unconstrained technological progress and political liberty are important moral goods as both ends and means as long as some properties of the urn are satisfied. Even if these properties are not necessarily satisfied, I show that Bostrom’s proposed solution of empowering a global government likely increases existential risk overall. 


Part 1: Outcomes Conditional on VWH Truth Value


VWH Is False

First, we can quickly consider what we should do if we knew that the VWH was false. In this case, the arguments made by progress studies in support of the set of Enlightenment values: political liberty, technological progress and political liberty ⇒ technological progress, are proved stronger. Since we know that there are no black balls in the urn, we can be confident that the (highly positive) sample mean of the effect of technology on human welfare is close to its true effect, and there is no future risk of ruin that will greatly upset this mean. There may still be other objections like effects on an inherently valuable environment, inequality, or doubts of the connection between political liberty and technological progress, but most people reading this are likely very positive about the effects of technological progress and political liberty except for their facilitation of catastrophic risks. If anthropogenic x-risk concerns are ameliorated then Enlightenment values look better than ever as tools for advancing human welfare.


VWH Is True

If VWH is true, then its implications depend on how we interpret the urn model. Bostrom suggests several different interpretations throughout the paper. There is the standard urn model where one ball is drawn at a time and the colors of the balls are independent and random. But this model is obviously an unrealistic description of technological progress. Independence means there is no room for technology to ameliorate future existential risks since previous draws do not affect future ones, but this contradicts Bostrom’s escape hatch of ‘exiting the semi anarchic condition.’ Complete randomness assumes that we have no knowledge about what the risk of a technology might be before it is actually invented.

An important clarification Bostrom makes is that “We can speak of vulnerabilities opening and closing. In the ‘easy nukes’ scenario, the period of vulnerability begins when the easy way of producing nuclear explosions is discovered. It ends when some level of technology is attained that makes it reasonably affordable to stop nuclear explosions from causing unacceptable damage.” This implies that the color of balls in the urn, i.e the risk from technologies, is not constant or independent of the balls which come before it.  If the technology which abates nuclear risks came before easy nukes, then that ball would have changed color and no vulnerability would have opened.  Additionally, “the metaphor would also become more realistic if we imagine that there is not just one hand daintily exploring the urn: instead, picture a throng of scuffling prospectors reaching in their arms in hopes of gold and glory, and citations.”

Wide Progress: Technological Antidotes

For the next two subsections, we’ll model the color of pulls from the urn as random, but not independent. That is, some technologies can change the risks of others but we don’t know what the risk of a technology will be before we invent it. Another basic assumption is that ‘technological maturity,’ i.e the inevitable topping out of our exponential growth into an S-curve, is desirable and stable, but the path there may be dangerous. 


Another way to encode this assumption is: If we discovered all possible technologies at once (which in Bostrom’s wide definition of technology in the VWH paper includes ideas about coordination and insight), we would be in the safe region. It is only that certain orderings of tech progress are dangerous, not that some technologies are incompatible with civilization in all contexts.

Allowing some technologies to change the risks posed by others makes the model more realistic. Bostrom claims that his global surveillance solution to anthropogenic risks is a one-size-fits all antidote, but in fact dangerous technologies admit a range of antidotes. For example, bio-terrorism may be solved with strict state surveillance over labs and inputs, but it would also be solved by sufficiently cheap and effective vaccines, improved PPE, or genetically engineered improvements to our immune system. Bostrom suggests avoiding collapse from ‘easy nukes’ by having the state requisition all batteries, but we could also use advanced materials to build explosion resistant buildings or use easy nukes to power vehicles which allow us to live very spread out, lessening the impact of nuclear explosions. Even technologies which do not obviously disarm black balls can be antidotes by increasing our wealth enough to make safety investments affordable.

In general, we’d like there to be at least one injective function from black balls to white ones. That is, there exists some pairing of technologies such that every dangerous invention has an antidote, and there are at least as many antidotes as black balls. Given the immense power and general purpose of technology and a reasonable upper bound on the ratio of black to white balls at ~1 in 500 million, it seems almost certain that each black ball could find at least one antidote without any repeats. If you believe that technological maturity is stable, i.e there is a safe region as in Bostrom’s graph above, then it must be that all risky technologies are disarmed by some future technology. If we have at least one injective pairing, then the implications of the VWH shift towards the unconstrained progress promoted by Enlightenment values. 

The limiting case of this danger-antidote relationship is an urn with two balls: one black, one white, representing a choice between extinction or technological ascendency. The white one is an antidote to the black one. Drawing nothing means stagnation until the earth is destroyed by natural processes. Let’s normalize the human value per century in this scenario to 1. Then this world has a value equal to the number of centuries humanity manages to survive on earth without any extra technology. The world where both balls are drawn is either empty or full of an astronomical number of human lives. Even in the worst case scenario where technology contributes nothing to human welfare except avoiding extinction (i.e the value of the technologically mature world is also 1), the expected value from drawing both balls (½ + ½ + …) still exceeds the finite stagnation world after a finite number of centuries.

In a world where every risk has at least one antidote, maximizing the number of “scuffling prospectors” pulling balls at once is desirable. This is intuitive in the 2-ball limiting case. If you can pull both balls at once then there’s no chance of an unalloyed existential risk. In general, pulling multiple balls at once decreases risk. To avoid black balls while pulling just one ball at a time, you need the antidote to show up before the black ball every time. But when you pull two balls at once, you have all the same chances for the antidote to show up before the black ball, plus the probability that the antidote and black ball are pulled at the same time. This additional probability is increasing in the number of balls per pull as long as each black ball has at least one antidote.


Fast Progress: Windows of Vulnerability

Bostrom models ‘windows of vulnerability’ from black ball technologies as opening when the ball is first pulled, and closing when some future ball makes us resistant to the black ball either by directly countering it with protective technology or by increasing wealth enough to make palliative safety investment affordable. We saw above that widening progress by increasing the number of people pulling balls from the urn decreases the probability that these windows ever open. If pulling a black ball before an antidote means certain destruction then this is the best we can do. But if we have some window after discovering a black ball to still get a technological solution in time, then decreasing the time between pulls from the urn can decrease risk.

If we keep increasing the pace of tech progress, then these windows of vulnerability will keep shrinking. Accelerating development is a form of differential development. Acceleration speeds up the arrival of late technologies more than close ones. This decreases risk because any antidotes that come before or simultaneously to black balls stay that way, and any antidotes that come after get accelerated further than the black balls which precede them, making it more likely that we could survive long enough to herald their arrival.

If we want to have a chance at the long-lasting and space-fearing future civilization which makes existential risk such an important consideration, we’ll need to greatly increase our technological ability. Doing this slowly, one ball at a time, just means less chance at pulling antidote technologies in time to disable black ball risks. For example, terraforming technology which allows small groups of humans to make changes to a planet’s atmosphere and geography may increase existential risk until space-settling technology puts people on many planets. If terraforming technology typically precedes space-settling then accelerating the pace of progress reduces risk. Enlightenment values enable wide and fast progress. Wide and fast progress can decrease risk from random draws from an urn with at least as many antidotes as risks. So Enlightenment values can decrease risk.


Differential Development

So far we’ve been assuming that the color of the ball we pull is completely random, but the best reason to slow technological progress is if decreasing the width or speed increases our ability to choose whether the next pull from the urn will be black or white. To accommodate differential technological development, we have to have some sense of what color a ball might be before we draw it. “We could stipulate, for example, that the balls have different textures and that there is a correlation between texture and color, so that we get clues about the color of a ball before we extract it.”  This seems plausible at least for the most proximate impacts of technologies. As Bostrom puts it “don’t work on laser isotope separation, don’t work on bioweapons, and don’t develop forms of geoengineering that would empower random individuals to unilaterally make drastic alterations to the Earth’s climate.”

But the impacts of a technology further in the future quickly become radically uncertain. If nuclear war or AGI kill billions then quantum mechanics or Von Neumann architecture will have turned out to be a black ball, but no one would have predicted that at the time. It’s not clear how this should affect our treatment of current explorations in math or physics either. Additionally, even technologies that we are confident are dangerous can have net positive effects on existential risk by mitigating other risks. For example, climate control has potential dangers but also advantages in ameliorating damage from climate change, supervolcanoes, and nuclear winter. Another well-known phenomenon is when research towards a technological goal produces unexpected and impactful spinoff discoveries. For example, research on cheaper ways to manufacture vaccines may also be used to make pathogens easier to produce. 

There is still room for differential development after a ball is pulled however. We’d be better off under random draws if we could set aside black balls or at least slow their roll until their antidotes were also discovered or developed. We could also try to speed up the development of the antidote technology. These strategies rely on the regulatory mechanism having consistently high accuracy and precision. If the mechanism is bad at picking black balls from white ones, then it will often end up slowing antidotes and speeding risks, washing out its overall effect. Even if the mechanism is an unbiased estimator of whether a technology will be disastrous, if the enforcement is imprecise, the effect would similarly be diluted. For example, if nuclear or bio weapons regulation also slows down nuclear power or biosafety research then we may be losing out on as much progress towards antidotes as we are gaining time until black balls. The idea here is like the wide progress model in reverse. Imagine differential regulation as choosing a group of balls to set aside rather than letting them develop. Imprecise regulation sets aside a large handful of balls rather than just one. Wider handfuls make it more likely that one or more antidote technologies are also set aside.

Further on this point, regulatory self-selection can lead enforcement which is centered on black balls to have even greater negative effects on surrounding white ball technologies than on the black ball itself. For example, laser isotope enrichment, a way to more cheaply enrich uranium, can be used for both nuclear power and nuclear weapons. If countries like the US or even international organizations like the UN ban this method on proliferation concerns, the people most likely to follow these rules are those using the method for peaceful nuclear power, like GE or Hitachi. The most dangerous uses of the technology will be much less affected because they take place in recalcitrant nations like North Korea or Iran. This can create a sort of Simpson’s paradox where differential development of nuclear technology relative to less dangerous fields is internally composed of differential rules that promote the most dangerous uses of a technology relative to their constructive ones. Even if you can devise a mechanism which accurately determines and precisely enforces differential development, protecting it from regulatory capture is a serious challenge.

The above reasoning applies to decisions about differential development on a societal level enforced by a government. The point of Enlightenment values, however, is that researchers, entrepreneurs, and philanthropic organizations get to make decisions about what to pursue themselves. Specialization ensures that you can’t contribute a little bit to all fields equally, so everyone has to choose something to differentially develop even with this uncertainty. The ethical intuitionism of not working on things which seem obviously dangerous combined with considerations for comparative advantage is unobjectionable. The research and advocacy done by many EA organizations which pushes people to work on highly impactful good technologies rather than potentially dangerous ones is consistent with Enlightenment values even though it is not the random technological progress we modeled above. That model only represents progress at the most aggregate and abstract level. Concrete individual decisions about which frontiers to push forwards are not random. Trying to pick good frontiers from bad is almost certainly good. But one need not feel paralyzed by uncertainty since even random choices decrease risk in the aggregate. 

Differential development enforced by an error-prone and inflexible government is dangerous. Correctly predicting the impacts of R&D is difficult. Getting it wrong on the global level is much higher stakes than on the individual level. State enforced development mistakes are likely to be locked in for long periods of time and since the state is guiding technological progress, there aren't other organizations who can mitigate their mistake by researching antidotes. Differential development on an individual level is unavoidable. Your views on the tractability of predicting the impact of your work determine the effort you should invest into picking the right field, but even random progress pushes humanity forward. 


Part 2: Is Bostrom’s Plan Even An Antidote?

The above discussion of wide and fast progress maximizing the probability of discovering technological antidotes is moot if Bostrom’s plan is the one-size-fits-all antidote that he says it is. If his global surveillance state truly offers robust protection against any future black balls then even if every black ball has several possible antidotes, it’s probably not worth rolling the dice on discovering them in time with unconstrained technological progress. It would be worth sacrificing Enlightenment values for sure-thing protection against future catastrophic risk. But Bostrom’s plan is not likely to provide robust protection against future black balls, and it would plausibly cause a  net increase in existential risk.

Bostrom does not want to rely on luck  to develop technological antidotes in time to prevent catastrophic damage from black balls. He reasons that since it is possible that a technology will exist such that even a few anonymous actors are enough to cause massive destruction, the state’s capacity for surveillance and policing has to be near absolute. And since inter-state competition is the source of many existential risks and catastrophic events, the surveillance state must be global in reach. His most detailed proposal, “The High Tech Panopticon,” consists of everyone on earth being fitted with a “freedom tag” that constantly records video and audio of everything you do. This data is automatically scanned for any criminal or suspicious activity by an AI. “Other extreme measures that could be attempted in the absence of a fully universal monitoring system might include adopting a policy of preemptive incarceration, say whenever some set of unreliable indicators suggest a greater than 1% probability that some individual will attempt a city-destroying act or worse.” Any indication of dangerous research or nefarious plans would dispatch an armed police unit to imprison or kill the perpetrator.

Bostrom acknowledges that this proposal is extreme, and he does not claim that his plan would be desirable all-things-considered. Rather, he says that his model “provides a pro tanto reason to support strengthening surveillance capabilities and preventive policing systems and for favoring a global governance regime that is capable of decisive action.” I will not object to Bostrom’s plan in an all-things-considered sense because the drawbacks of a global surveillance state to human welfare outside of existential risk considerations are already clear. Instead, I will show that Bostrom’s plan for global governance would not decrease existential risk as long as we have a reasonable model for the incentive structures within such a state.


Global totalitarianism is its own existential risk

The only anthropogenic events that have come close to fulfilling Bostrom’s definition of ‘civilization destruction’ have been perpetrated by states. The conquests of the Mongol Empire may have killed more than 10% of the world’s population. The Thirty Years War killed nearly half of the Holy Roman Empire’s population. The Khmer Rouge executed nearly 20% of their own population. Communist China and the Soviet Union probably killed upwards of 100 million people combined

None of these quite satisfy Bostrom’s 15% of global population threshold, but they are the closest we’ve come to it so far. Some of them arose out of interstate conflict which would, at least nominally, be avoided with a global state. But many of the deadliest events in human history have been states murdering people, quashing economic development, or causing famines, within their own borders. Establishing a global state with the police powers that Bostrom recommends would facilitate these catastrophic state-led massacres and incentivize state-enforced stagnation.

Dealing with a powerful global surveillance state is not unlike dealing with a powerful AI. Problems of alignment and instrumental convergence come immediately to the fore. 

Instrumental convergence

To fulfill Bostrom’s mandate of extremely effective preventative policing and global authority, a government must first maintain power. Bostrom bit the bullet in recommending that this surveillance state quash any potentially dangerous technological development but to ensure that it will protect humanity from itself, the state must also seek out and destroy dissent. Given the draconian control that this state imposes on everyone, it seems likely that at least 15% of the population would strongly resent and resist this government. Executing dissenters could easily be a catastrophic risk in itself. Beyond directly killing or imprisoning anyone who might try to disobey the state, the government will likely find that scapegoating certain groups is a good way to justify their power, shift blame for stagnant or deteriorating 

economic conditions, and maintain stability. This is an established strategy not only of totalitarian states, but also of democracies, and human groups in general. Crushing dissent and providing enemies to rally around are principal occupations of any state which wishes to maintain the level of control that Bostrom recommends. These are violent and costly processes which represent a serious catastrophic risk to humanity. And unlike many of Bostrom’s other examples of possible catastrophic risks, states have demonstrated their capacity for murder, imprisonment, and oppression on massive scales multiple times throughout history.

In addition to the direct catastrophic risk of recurring genocides, a global surveillance state would over-enforce technological stagnation relative to it’s mandate of preventing technological x-risk because it is easier to stay in control of a society in stasis than one that is rapidly growing. Feudal dynasties in Europe, for example, stayed stable for centuries because their powerful subordinates, large landholders, were bound by cultural, religious, and family norms. When a new group of merchants and industrialists began to gain economic power, they 

demanded political influence. Influence is zero-sum so it had to come at the expense of the old guard. If it is to fulfill its mission of preventing anthropogenic risk long into the future, the global surveillance state cannot afford to risk usurpation. Any variance in the state’s control over technology is bad because it might mean that a future existential risk slips through and destroys billions of potential future lives. Therefore, it will use its mandate of regulating technology to not only shut down potentially dangerous projects, but also to prevent anything which might shift the balance of power within their social pyramid. Again, states have demonstrated their desire and ability to enforce technological stagnation or regress several times throughout history. These actions correspond to Bostrom’s ‘permanent stagnation’ class of existential risk. In this scenario, humanity may avoid total extinction at our own hands, but we remain at a low level of utility compared to a technologically mature world and we are eventually snuffed out by natural phenomena.



The above risks arise from a global state which is loyally following its mandate of protecting humanity’s future from dangerous inventions. A state which is not so loyal to this mandate would still find these tools for staying in power instrumental, but would use them in pursuit of much less useful goals. Bostrom provides no mechanism for making sure that this global government stays aligned with the goal of reducing existential risk and conflates a government with the ability to enact risk reducing policies with one that will actually enact risk reducing policies. But the ruling class of this global government could easily preside over a catastrophic risk to their citizens and still enrich themselves. Even with strong-minded leaders and robust institutions, a global government with this much power is a single point of failure for human civilization. Power within this state will be sought after by every enterprising group whether they care about existential risk or not. All states today are to some extent captured by special interests which lead them to do net social harm for the good of some group. If the global state falls into the control of a group with less than global interests, the alignment of the state towards global catastrophic risks will not hold. 

A state which is aligned with the interests of some specific religion, race, or an even smaller oligarchic group can preside over and perpetrate the killing of billions of people and still come out ahead with respect to its narrow interests. The history of government gives no evidence that alignment with decreasing global catastrophic risk is stable. By contrast, there is evidence that alignment with the interests of some powerful subset of constituents is essentially the default condition of government. 

If Bostrom is right that minimizing existential risk requires a stable and powerful global government, then politicide, propaganda, genocide, scapegoating, and stagnation are all instrumental in pursuing the strategy of minimizing anthropogenic risk. A global state with this goal is therefore itself a catastrophic risk. If it disarmed other more dangerous risks, such a state  could an antidote but whether it would do so isn’t obvious. In the next section we consider whether the panopticon government is likely to disarm many existential risks. 


Global surveillance states have strong incentives to develop dangerous technologies

To guarantee authority over humanity’s dangerous technological development, the global surveillance state will try to keep their technology level as high as possible relative to their constituents. We saw above one part of their strategy: enforcing technological stagnation. This alone may not be sufficient, however. The state may benefit from using technology to increase its capacity for longevity and control. These incentives would lead a global state to develop and deploy dangerous technologies.

To conquer and retain authority over all existing nation-states, the global surveillance state will need exclusive access to powerful military technology. To carry out near-perfect surveillance and enforcement of technology standards around the world, they will need artificial intelligence. Bostrom describes both of these in his paper: “Encrypted video and audio is continuously uploaded from the device to the cloud and machine-interpreted in real time. AI algorithms classify the activities of the wearer, his hand movements, nearby objects, and other situational cues.” And “the global governance institution itself could retain an arsenal of nuclear weapons as a buffer against any breakout attempt.”

So the surveillance state at a minimum has good reason to develop and deploy the two most dangerous technologies of our time, nuclear weapons and artificial intelligence. The presumed lack of interstate conflicts might make nuclear weapons less dangerous, but some of the deadliest conflicts of all time (Thirty Years War, Sengoku Period Japan, American Civil War) were intrastate ones. If the global surveillance state has to use nuclear weapons to quell “breakout attempts'' then it doesn’t really make a difference what the borders look like on a map. More obviously, an artificial intelligence algorithm which is constantly monitoring video, audio, and location data in real time from literally every human being on earth is such a massive AI x-risk that I don’t understand why Bostrom even mentions it in this paper, let alone recommends it as a strategy to reduce existential risk. One wonders if Bostrom in this context should be read in Straussian terms! AI safety researchers argue over the feasibility of ‘boxing’ AIs in virtual environments, or restricting them to act as oracles only, but they all agree that training an AI with access to 80+% of all human sense-data and connecting it with the infrastructure to call out armed soldiers to kill or imprison anyone perceived as dangerous would be a disaster. 

Beyond these two examples, a global surveillance state would be searching the urn specifically for black balls. This state would have little use for technologies which would improve the lives of the median person, and they would actively suppress those which would change the most important and high status factors of production. What they want are technologies which enhance their ability to maintain control over the globe. Technologies which add to their destructive and therefore deterrent power. Bio-weapons, nuclear weapons, AI, killer drones, and geo-engineering all fit the bill. 

A global state will always see maintaining power as essential. A nuclear arsenal and an AI powered panopticon are basic requirements for the global surveillance state that Bostrom imagines. It is likely that such a state will find it valuable to expand its technological lead over all other organizations by actively seeking out black ball technologies. So in addition to posing an existential risk in and of itself, a global surveillance state would increase the risk from black ball technologies by actively seeking destructive power and preventing anyone else from developing antidotes.


Even global states are bad at solving coordination problems

Coordination problems are the central challenge of human society. The price system is currently the most powerful global coordination mechanism we know of. The price system leads participants to make socially beneficial tradeoffs even when everyone acts in their self-interest and no one knows more than their preferences and immediate circumstances. However, the price system has well documented inefficiencies when dealing with things that can’t be priced: access to public goods, and the effects of a transaction on bystanders including future people. Thus, technological progress and differential development will be underproduced compared to the ideal because much of the benefit from these pursuits accrues to future people who have no way of incentivizing present day researchers and entrepreneurs who incur the costs. Nation-states can theoretically solve these externality problems within their borders, but they still face challenges from externalities with other nations, future peoples, and global public goods. Bostrom is therefore correct that some addition or change to our current system of global coordination is needed to optimally address global catastrophic risks. This does not mean that any change towards a global state is an improvement. 

In his paper Bostrom acknowledges the failure of existing states to solve even trivial coordination problems: “the problem confronting us here presents special challenges; yet states have frequently failed to solve easier collective action problems.” Few states have national carbon taxes or congestion pricing. In fact, most states spend billions on automobile and fossil fuel subsidies despite the obvious negative externalities. Similar things hold for even larger and more common agricultural subsidies despite negative externalities from carbon emissions, aquifer depletion, and fertilizer runoff. States also commonly subsidize suburban and rural living with land use regulations, single family zoning, transportation subsidies, and height restrictions despite the positive economic externalities of city life and the negative environmental externalities of suburban and rural living. Federal bureaucracies prefer to protect themselves from blame rather than do what produces the highest expected value for society. So not only are states failing to solve internal collective action problems, they are often actively making them worse! Some of the benefits from solving these problems may be captured by other nations, but this is a reason why countries might not work optimally hard on producing/avoiding externalities, not a reason why they would actively subsidize the negative and restrict the positive. This behavior is explained by standard public choice critiques of (especially democratic) governments. All of these modern states have the capacity to improve on voluntary allocation of resources, but they have incentives to do the opposite.

Since these coordination failures do not primarily come from a lack of state capacity or even externalities between states, they will not be solved simply by creating a global state with the same internal incentives that current states face. Despite this, Bostrom continues to assume that the power to take a socially beneficial action is sufficient to guarantee that the state will actually do it. “States have frequently failed to solve easier collective action problems … With effective global governance, however, the solution becomes trivial: simply prohibit all states from wielding the black-ball technology destructively.” If the institutional design of this global state looks similar to any modern state, this global state will be just as susceptible to concentrated-benefit-diffuse-cost attacks, rational ignorance and irrationality among voters, and internal bureaucracies optimizing socially inefficient sub-games. These will push the global state not only to ignore possible optimizations, but actively promote negative externalities when they benefit powerful stakeholders.

We should not commit the Nirvana fallacy of comparing an imperfect market solution to a perfect but unattainable government solution. Similarly, we should not reject a global state because it is imperfect but instead compare it to realistic options. Even when compared to our highly imperfect form of decentralized authority and inter-state competition, however, a realistic world state does not look like a clear winner in terms of its likelihood of solving externality problems. Collective actions problems tend to get bigger the bigger the collective. The world state can amortize costs over billions more people than any existing state which means that they can get away with costlier subsidies to more concentrated interest groups than any existing state. Internal bureaucracies in a world state will have to have many more layers and less oversight, allowing more dysgenic optimization and corruption to take place. Divisions between factions within the world state would be much larger than divisions within current nation-states. The costs of staying informed of a global state’s policies is likely higher than for smaller ones, and the chance that any one vote will change the outcome of a global election is certainly much lower, so voter ignorance and irrationality will abound. 

The ability to solve coordination problems is not sufficient for solving coordination problems, as the track record of existing states shows. Coordination technologies will be important antidotes for many types of risks from externalities, but creating a global surveillance state is not one of these antidotes.



The steelman for global governance and preventative policing is probably something like “on the margin, it would be good to increase government oversight of specifically dangerous technologies like nuclear weapons, AI, and bioweapons.” There are some specific objections to this policy change. One can reasonably doubt the ability of governments to predict which technologies are dangerous, which are beneficial, and decide appropriate and precise regulations. There may also be problems when governance institutions are captured, and so work towards anti-social interests which may include enforcing their exclusive ownership of powerful technologies, or even sponsoring their development, so that they can dominate their rivals. Depending on the enforcement and authority of a super-national government, there might be undesirable selection effects where the states most likely to use powerful technologies for good are also the ones which closely follow the restrictions on technological development, while more conflictual countries (e.g North Korea, Iran, Pakistan) refuse to follow the rules resulting in inadvertent differential development in favor of violence. 

Some global governance, however, can come with big advantages. Super-national government can bring down trade and immigration barriers. Even without explicit treaties, increasing the economic and cultural interconnectedness of the world is likely a good way to avoid interstate conflict. Global government is not necessary or sufficient for these openness benefits, but it may help. International agreements such as the Paris Climate Accords can help with global externalities like climate change. A government organization which practices differential development by trying to accelerate certain beneficial technologies rather than banning dangerous ones could have a positive net impact even if its funding choices were random because of the positive externalities from most technologies. 

A more moderate increase in the policing of risky technologies on the global scale could grab some low-hanging fruit. Nuclear weapons technology has been restrained by agreement and monitoring. International and non-governmental policing may be able to restrain the most outstanding tech risks without risking the dangerous overreach of a global state. This case is much more likely to improve the world’s risk profile than the global panopticon, but it is very different from what Bostrom proposes. Although Bostrom would likely see this plan as an improvement on the status quo he says that “while pursuing such limited objectives, one should bear in mind that the protection they would offer covers only special subsets of scenarios, and might be temporary. If one finds oneself in a position to influence the macroparameters of preventive policing capacity or global governance capacity, one should consider that fundamental changes in those domains may be the only way to achieve a general ability to stabilize our civilization against emerging technological vulnerabilities.”


Part 3: Synthesis and Conclusion

Bringing all of this together, what does it mean for how EA and progress studies should think about existential risk?

Humanity’s existential risk profile is dominated by risks coming from current and potential technologies. But existential risk is also reduced by technology. Technologies reduce non-anthropogenic existential risks like asteroid strikes, supervolcanoes and pandemics but technologies can also be antidotes to anthropogenic risks including technology itself. Given natural existential risk, and the current levels of unsolved anthropogenic risks, stagnation clearly has risks of its own. The question is how to proceed with technological progress without creating unacceptable risks along the way.

A hypothetical filter which bans the invention, development, or use of overall harmful technologies until future antidotes tip the scales–and bans nothing else– would be ideal. Realistic filters, however, present challenges and risks of their own. Bostrom’s panopticon government does not look especially promising. An organization or agent who is empowered and motivated to pursue existential risk reduction on this level will find it necessary to sustain the filter’s authority over humanity for centuries to come. Totalitarian strategies of crushing dissent, genocide of scapegoats, enforced stagnation, and the development of world-destroying weapons will be useful for securing the power that is a necessary instrument for this filter. 

The power which comes with the ability to construct and enforce this filter will be the ultimate prize sought after by anyone with interests narrower than the future of all humanity. Small groups could easily coordinate and use the filter mechanism to spread costly externalities or risks among the rest of the world while greatly enriching themselves. Even worse, some groups will seek to use the filter mechanism to outright destroy others. There is currently no known institutional design with this kind of power that has demonstrated even temporary immunity to this misalignment. All current states sacrifice the interests of present groups and especially future people to benefit concentrated interests within their borders. Whether the filter is aligned with the interests of humanity as a whole or not, it will hold onto its power ruthlessly so it represents a significant existential risk. Previous governments include several cases of genocide which approach Bostrom’s definition of existential risk so bigger and more powerful versions of governments seem  like an unpromising strategy to reduce net risk.  Given the difficulty of correctly setting up this filter and the dangers of getting it wrong, we ought to look for safer and easier ways to decrease our existential risk. 

Upholding Enlightenment values is a good place to start the search for this optimal risk-reduction strategy. Rapid technological progress is at least as likely to produce antidotes to natural and anthropogenic technological risks as it is to create more of them. The open society and rapid progress that Enlightenment values foster also facilitate fast adaptation when we encounter new problems. Adherence to these values is only heuristic, but our lack of information on the dangers and benefits of most future technologies and the poor quality of alternative coordination methods make a strong case for upholding Enlightenment values in the face of existential risk.


Action Relevance

Technological existential risk is an important consideration for human welfare but what follows from that recognition isn’t obvious. This essay makes two arguments. First, wide and fast progress can decrease overall risk. Second, high risk from technological progress does not itself justify state intervention in technological progress because states do not automatically, or in fact usually, internalize the relevant externalities that would lead them to actually decrease technological risk. Neither of these things mean that addressing technological existential risk is any less important! They just imply different strategies for addressing it. 

What are the actionable steps that organizations and individuals would take given these arguments?

  • Differential development on the individual level is beneficial and essentially required by specialization so EAs and philanthropists who are convinced that AI safety research (or any other cause) is the best thing to devote their time and money to should remain that way and they should continue trying to convince others of the same.
  • Much more consideration of state failure and state-led existential risk is needed before the heuristic of Enlightenment Values can be reasonably overridden. Governments are not likely to improve AI governance.
  • R&D on coordination mechanisms to improve governance has high leverage and more work should be done in this space.
  • Supporting policies and politicians to get differential development is probably not an effective use of your time. State enforced differential development is unlikely to be accurate, precise, and resistant to regulatory capture. I think this applies to political efforts for artificial intelligence risks.
  • Differential development via speeding up beneficial technologies is better than banning dangerous ones because getting it wrong has fewer downsides as it corrects for positive externalities from technology anyways.
  • More moderate global governance which promotes interconnectedness between nations, funds beneficial technologies, and uses soft power to sanction violent nations could secure some low-hanging risk reductions without high costs.


AGI risk specifics

Many view AGI as the largest source of existential risk for the next few centuries. Assuming that they are correct, let's consider the argument for state-led differential development. The voluntary allocation of R&D effort plausibly overproduces AI capability due to the external costs of AI risk. So a group of benevolent planners could, in theory, make everyone better off by slowing down AI capability growth relative to AI safety knowledge. 

Several things have to happen before this theoretical possibility is realized, however. First, these planners and their constituents have to actually care about regulating AI capability research. The temporal and global externalities of AI risk make this difficult for politicians who need immediate results and rationally irrational voters. The increasing importance of AI in the economy and the work being done by AI risk researchers and advocates has already helped to overcome this apathy. Several governments around the world have AI strategies

Once governments are interested in regulating the AI sector they have to be guided towards regulating the sector in the interests of all of humanity, present and future, rather than the interests of some smaller group. This will also be difficult since political coordination among all of humanity’s present and future is much more difficult than within some smaller group such as the military or an industry interested in favorable AI regulation. It seems more likely that government intervention in the AI sector will look more like advancing military uses of AI, perhaps the most dangerous use of AI, or protecting the interests of Big Tech by raising entry barriers, extending their intellectual property, and providing government contracts rather than a principled slow down of AI progress so that we all have time to consider the potential consequences of developing AI too soon. 

Even if we manage to convince governments to prioritize humanity’s future over the rewards offered by special interest groups, well-intentioned attempts to accelerate AI safety relative to AI capabilities could easily make things worse. To improve our risk profile, differential development needs to be accurate and precise. Governments need to understand where the risk of AI comes from and they need to be able to target that source without much spillover. This is particularly difficult with AI because the precise source and form of AI risk is a subject of intense debate and there is considerable overlap between AI capability research and AI safety research. The optimal regulation strategy is very different if AI risk comes primarily from something like modern corporate language models or if it looks more like a rogue computer virus or a killer robot or something no one has thought of yet. Regulating the wrong type of AI could easily increase overall existential risk since the technology has so much potential to solve other risks. It may also make it more difficult to retarget the regulation once we have more information in the future.

Additionally, AI capability research and AI safety research are sometimes hard to tell apart so highly precise targeting is necessary. FTX Future Fund’s ML Safety Scholars program has safety in the name, but it’s mostly about teaching young people how machine learning works. EU regulations on AI are so imprecise that most of scientific research in general is covered by them. Imprecise regulation may slow AI safety research as much or more than it slows the growth in AI capability. Even with high precision, AI safety and capabilities research are difficult to separate. AI safety researchers need good models of what AI capabilities will be. Their search for these models may inspire the creation of advanced AI, create info-hazards, or inadvertently create AI. AI capabilities researchers need good ways to understand and control their products. They may be the first to develop effective interpretability tools, kill switches, and simulation boxes. Promoting AI safety increases the risk of black balls from their field, and curtailing AI capabilities research decreases the chance of that field producing antidotes. There is still room for this to be a beneficial trade off of course, but it decreases the expected value even assuming that the government has completely altruistic intentions. 

For these reasons, AI’s importance as a large existential risk is insufficient to justify government intervention without new institutional design. The huge gains from decreasing AI risk are balanced out by huge losses from increasing it. If governments regulate AI like they regulate most other industries they are far more likely to increase risk than decrease it as they pursue the interests of concentrated interest groups and their own short-term interests. Even with spotless intentions, correctly regulating AI is very difficult under the uncertainty over the sources of AI risk. The most consequential regulation may be to prevent military AI research but to even broach such a question raises the question–which countries will do this? The expected value from state regulation of AI is not sufficient to override the heuristic of upholding Enlightenment values. 


What do we know about the truth of the VWH?

The short answer is not much. We’re pulling balls from an urn, but we don't know the  total number of balls and we’ve only drawn white balls so far. This is the exact formulation of Nassim Taleb’s Black Swan problem. No matter how many white swans we’ve observed, we will never learn anything about how many black swans there are unless we observe every swan or see a black one. Naively, one could use some inductive procedure; updating confidence that there are no black balls in the urn each time we observe a white ball. However, no matter how high our certainty gets, it will always take only one observation for it to collapse to zero, making a sliding scale of certainty meaningless. Taleb gives another illustrative example: Imagine you’re a Bayesian turkey on a farm. At first, you may be unsure whether the farmer has your best interests at heart. But "every single feeding will firm up the bird's belief that it is the general rule of life to be fed every day by friendly members of the human race 'looking out for its best interests,' as a politician would say. On the afternoon of the Wednesday before Thanksgiving, something unexpected will happen to the turkey. It will incur a revision of belief."

No amount of analysis on the previous 1000 days of your life as a Bayesian turkey could have informed you about the impending doom. And in fact, the Bayesian turkey’s confidence of its safety reached its peak when it was in the most danger. An eerily similar graph could be worked up for human’s well being over time. Just take a graph of basically any metric over time and end it swiftly some time in the future to confirm the doomsday argument.

So despite centuries of human existence and hundreds of millions of inventions passing by without extinction, it would be irresponsible to claim certainty in VWH’s falsehood. There are only three ways to resolve this uncertainty. One is to have a hard-to-vary explanatory theory which can give us knowledge outside the box of probability. E.g: “The turkey is at risk because it is on a human farm and humans like to eat turkeys on Thanksgiving” or “human technological progress is not an existential risk because the law of conservation of energy implies that any technology powerful and cheap enough to destroy civilization is also powerful and cheap enough to build a much more resilient civilization.” An explanatory theory like this might exist for our relationship with existential risk and technology, but few have been posited and none confirmed. The second way we can resolve uncertainty is to discover all technologies, i.e empty the urn without finding a black ball. The third way is to pull a black ball and destroy civilization. 

Bounds on the ratio of black to white balls

Although we cannot resolve uncertainty around the VWH, we can use past data to put plausible upper bounds on the amount of risk that comes from the average invention. Since the invention which spells our end will certainly not be an average one, this is only useful for contextualizing the problem, not for predicting the future. 

How many balls have we pulled from the urn of technology so far? Bostrom “uses the word ‘technology’ in its broadest sense … we count not only machines and physical devices but also other kinds of instrumentally efficacious templates and procedures – including scientific ideas, institutional designs, organizational techniques, ideologies, concepts, and memes.” This definition is so broad that it is difficult to quantify. To put a lower bound on it, here is some data on the number of patents and scientific papers published each year since around 1800. 

There have probably been at least 100 million patents worldwide which is itself a lower bound for the number of inventions, and there are 120 million papers in the Microsoft academic database. We can be confident that these numbers severely undercount the number of inventions and scientific ideas, and they do not even attempt to capture “institutional designs, organizational techniques, ideologies, concepts, and memes.” A reasonable estimate of all the acts of invention and scientific discovery not tracked by these data plus all the other more amorphous concepts also in the urn easily exceeds 500 million. Following Toby Ord’s estimations of natural existential risk, we can use this historical data to put plausible upper bounds on the per-draw risk of pulling a black ball. Let’s normalize to groups of 100 thousand balls to save space on digits. So we’ve probably pulled between 2,200 and 10,000 groups of 100k balls from the urn of knowledge. If we had a 99% chance of avoiding extinction or catastrophe with each group, there would be at most a .00000000025% chance of surviving as long as we have. For our history to be more likely than a 1 in 1000 chance we’d have to have to have at the very least 99.6% of surviving each group of 100k draws without incident. If you think we’ve drawn more than 220 million balls so far, this minimum probability of safety increases further. For our history of no black ball incidents to be more likely than not, we’d need a probability of safety for each group of 100k draws between .9997 and .99993, depending on how many draws you think we’ve had so far. 


These numbers are credible bounds on the chance of catastrophe from a given invention if the existential risk per invention is not increasing over time, which may be a suspect assumption. However, even accepting these bounds does not necessarily relieve worries about technological x-risk despite the microscopic probabilities they place on it. Even if the chance of catastrophe per draw from the urn is not increasing, the number of draws we are taking is increasing. If our invention rate keeps growing like it is, in 200 years we might be inventing 3 billion things a year. That’s 30,000 groups of 100k inventions. Even if each group has a 99.993% chance to not kill us, getting 30,000 of these in a row is around a 12% chance. And that’s just one typical year in 2200! (However, if we make it to 2200, we’ll have observed many billions more safe inventions so we’d have a lower bound on the rate of black balls in the urn. Not sure how this time inconsistency works.)

Again, this analysis doesn’t get us any closer to answering the question central to the VWH: is there at least one black ball in the urn? It does inform us about the most likely ratio of black balls to white balls in the urn. If it were not millions of times more likely to draw a white ball than a black ball from the urn, then there would be almost no chance of making it this far.


Is Technological Maturity stable?

The specific characteristics of a technological mature humanity certainly depend on technologies, cultures, and even biologies which are unimaginable to us today. This makes it impossible to say with confidence why it would or would not be stable. Perhaps the Fermi paradox gives us reason to doubt that it is, but perhaps not. The important thing is that if technological maturity is not stable, the argument for long termism and for caring a lot about existential risk becomes much weaker. The possibility of a long, populous, and rich future is what makes existential risk important. 

If existential risk is constant or decreasing in the number of technologies we discover but never low enough to be called ‘stable,’ then rapid technological progress and political liberty is still good for all of the benefits it brings along the way. If technological maturity is unstable because risk is increasing in the number of technologies we discover then the implications depend sensitively on unknown parameters. Low value from technology and high risk might imply that a return to pre-industrial agrarian life maximizes human value. The other way around and it might be that the extra risk is worth the big gains we get in wealth and population from technology along the way.

Sorted by Click to highlight new comments since:

Bostrom [claims] that even a small credence in the VWH requires continuously controlling and surveilling everyone on earth.

Bostrom does not claim this. Period. (Your reading is good-faith, but Bostrom is frequently misread on this topic by people criticizing him in bad faith, so it's worth emphasizing-- and it's just an important point.) He narrowly claims here that mass surveillance would be necessary given "a biotechnological black ball that is powerful enough that a single malicious use could cause a pandemic that would kill billions of people."

Another relevant Bostrom quote:

Comprehensive surveillance and global governance would thus offer protection against a wide spectrum of civilizational vulnerabilities. This is a considerable reason in favor of bringing about those conditions. The strength of this reason is roughly proportional to the probability that the vulnerable world hypothesis is true.

He goes on to discuss the downsides of surveillance and global governance. So your quotes like "Bostrom’s plan [purports to be a] one-size-fits-all antidote" are not correct, and Bostrom would agree with you that totalitarianism and surveillance present astronomical risks.

This is fair. I got a little sloppy with my language there, but elsewhere I note that Bostrom is  arguing for this state "pro-tanto" not "all things considered." My reading is that the panopticon proposal is mostly a rhetorical strategy to give a concrete image of what the massive expected values of existential risk might justify. 

I still think that his narrow claim is wrong though.  "Mass surveillance would be necessary given "a biotechnological black ball that is powerful enough that a single malicious use could cause a pandemic that would kill billions of people."

Even in the conditional world where the deadly pandemic exists, mass surveillance is only good if it is used to actually stop the pandemic and does not cause more harm afterwards. I don't think either of these things are very likely if they're attached to any form of government we're familiar with. Mass surveillance isn't even necessary, it's just one possible technological solution to a bio-tech black ball. Really good vaccines or PPE or genetic improvements to the immune system would also suffice. 

Thanks for your reply.

I disagree that you just "got a little sloppy"; you exaggerate Bostrom's policy  recommendations elsewhere too and generally frame the relevant parts of your piece as arguing against Bostrom rather than as arguing against someone who advocates positions that Bostrom merely analyzes. Most readers would get the sense that "Bostrom claims that his global surveillance solution to anthropogenic risks is a one-size-fits all antidote"; this is false.

And of course I agree--and Bostrom would agree--that there are many possible solutions and countermeasures to dangerous biotechnology. But if we're assuming that a particular technology is a black ball presenting a type-1 vulnerability, as Bostrom does for the sake of illustration in one paragraph, we are necessarily assuming that (1) it devastates civilization by default, so we are necessarily assuming that eg PPE won't save us, and (2) it is available to a large number of actors by default, so we need something like mass surveillance to preempt use. So I think you're saying something reasonable, but not really disagreeing with Bostrom.

Bostrom says in the policy recommendations: 

"Some areas, such as synthetic biology, could produce a discovery that suddenly democratizes mass destruction, e.g. by empowering individuals to kill hundreds of millions of people using readily available materials. In order for civilization to have a general capacity to deal with “black ball” inventions of this type, it would need a system of ubiquitous real-time worldwide surveillance. In some scenarios, such a system would need to be in place before the technology is invented."

So if we assume that some black balls like this are in the urn which I do in the essay, this is a position that Bostrom explicitly advocates, not just one which he analyzes. But even assuming that the VWH is true and a technology like this does exist, I don't think this policy recommendation is helpful.

State enforced "ubiquitous real-time worldwide surveillance" is neither a necessary nor sufficient technology to address a type-1 vulnerability like this unless the definition of type-1 vulnerability trivially assumes that it is. Advanced technology that democratizes protection like vaccines, PPE, or drugs can alleviate a risk like this, so a panopticon is not necessary.  A state with ubiquitous surveillance need not stop pandemics to stay rich and powerful and indeed may create them to keep their position. 

Even if we knew a black ball was coming, setting up a panopticon would probably do more harm than good, and it certainly would if we didn't come up with any new ways of aligning and constraining state power. I don't think Bostrom would agree with that statement but that is what I defend in the essay. Do you think Bostrom would agree with that on your reading of the VWH?

I think this post contains some major and some minor errors and is overall fairly "one-sided", and that the post will therefore tend to overall worsen & confuse (rather than improve & clarify) debates and readers' beliefs. Below I discuss what I see as some of the errors or markers of one-sidedness in this post. I then close with some other points, e.g. emphasising that I do think good critiques and red-teaming is valuable, noting positives of this post, and acknowledging that this comment probably feels kind-of rude :) 

Here are some of the things I see as issues in this post. Some are in themselves important, and others are in themselves minor but seem to me like indications of the post generally seeming quite inclined to support a given conclusion rather than more neutrally surveying a topic and seeing what it lands on.* I've bolded key points to help people skim this.

  • As Zach mentioned, I think you at least somewhat overstate the extent to which Bostrom is recommending as opposed to analyzing these interventions.
    • Though I do think Bostrom probably could and should have been clearer about this, given that many people have gotten this impression from the paper.
  • You seem to argue (or at least give the vibe that) that there's there's so little value in trying to steer technological development for the better than we should mostly not bother and instead just charge ahead as fast as possible. It seems to me that this conclusion is probably incorrect (though I do feel unsure), that the arguments you've presented for it are somewhat weak, and that you haven't adequately discussed arguments against it.
    • Your arguments for this conclusion include that it's hard to predict the potential benefits and harms of various technologies, that some dangerous and powerful techs like AI can also protect us from other things, and that actors who would steer technological development have motives other than just making the world better.
      • I think these are in fact all true and important points, and it's good for people to consider them.
      • But I think there are still many cases where we can be pretty confident that our best bet is that some tech will reduce risk or will increase it and that some way of steering tech will have net positive effects.
        • I don't mean that we can be confident that this will indeed happen this way, but that we can be confident that even after another 10,000 hours of thinking and research we'd still conclude these actions are net positive in expectation (or net negative, in the cases where that's our guess). And we should take action on that basis.
        • (I won't try to justify this here due to time constraints, and it would be fair to not be convinced. But hopefully readers can try to think of examples and realise for themselves that my stance seems right.)
        • And if we can't be confident of that right now, then it seems to me that we should try to (a) gain greater clarity on what tech steering would be good and greater ability to learn that or implement our learnings effectively, and (b) avoid actively accelerating tech dev in the meantime. (As opposed to treating our inability to usefully steer things as so unchangeable that we should just charge ahead and hope for the best.)
      • It seems odd to me to act as though we should be so close to agnostic about the net benefits or harms of all techs and so close to untrusting of any actors who could steer tech development that we should instead just race ahead as fast as we can in all directions.
      • In some cases, I think making simple models, Fermi estimates, or forecasts could help make "each side"'s claims more clear and help us figure out which should get more weight. An example of what this could look like is here: https://blog.givewell.org/2015/09/30/differential-technological-development-some-early-thinking/ (This actually overall highlights the plausibility of the "maybe accelerating AI is good" stance. And I agree that that's plausible. I'm not saying this post supports my conclusion, just that it seems like an example of a productive way to advance this discussion.)
    • I haven't re-read your post closely to check what your precise claims and are if you somewhere provide appropriate caveats. But I at least think that the impression people would walk away with is something like "we should just race ahead as fast as possible".
  • A core premise/argument in your post appears to be that pulling a black ball and an antidote (i.e., discovering a very dangerous technology and a technology that can protect us from it) at the same time means we're safe. This seems false, and I think that substantially undermines the case for trying to rush forward and grab balls from the urn as fast as possible.
    • I think the key reason this is false is that "discovering" a technology or "pulling a ball from the urn" does not mean it has reached maturity and been deployed globally. So even if we've discovered both the dangerous and protective technology, it's still possible for the dangerous technology to be deployed in a sufficiently bad way before the protective technology has been deployed in a sufficiently good way.
      • I think there are also reasons why that might be likely, e.g. in some ways it seems easier to destroy than to create, and some dangerous technologies would just need to be deployed once somewhere whereas some protective technologies would need to be deployed continuously and everywhere. (That might be the same point stated in two separate ways - not sure.)
      • OTOH, there are also reasons why that might be unlikely, e.g. far more people want to avoid existential catastrophe than to enact it.
      • Overall I'm not sure which is more likely, but it definitely seems at least plausible that we could end up with disaster if we discover both a very dangerous tech and a paired protective tech at the same time.
    • I'll illustrate with one of your own examples: "[Increasing our technological ability] slowly, one ball at a time, just means less chance at pulling antidote technologies in time to disable black ball risks.For example, terraforming technology which allows small groups of humans to make changes to a planet’s atmosphere and geography may increase existential risk until space-settling technology puts people on many planets. If terraforming technology typically precedes space-settling then accelerating the pace of progress reduces risk." But I think if we develop such terraforming technology and such space-settling technology at the same time, or even develop space-settling technology somewhat earlier, that does guarantee we will in fact have built self-sustaining settlements on many places before an individual uses the terraforming technology in a bad way.
      • It's still totally possible for us to all die due to the terraforming technology before those self-sustaining settlements are set up.
    • Another way to illustrate this: You write "If we discovered all possible technologies at once (which in Bostrom’s wide definition of technology in the VWH paper includes ideas about coordination and insight), we would be in the safe region." I encourage readers to genuinely try to imagine that literally tomorrow literally the ~8 billion people who exist collectively discover literally all possible technologies at once, and then consider whether they're confident humanity will exist and be on track to thrive in 2023. Do you (the reader) feel confident that everything will go well in that world where all possible techs and insights on dumped on us at once?
  • I also don't agree, and don't think Bostrom would claim, that technological maturity means having discovered all possible technologies, or that we would necessarily be safe if we'd discovered & deployed all possible technologies (even if we survive the initial transition to that world). 
    • Bostrom writes "By ‘technological maturity’ we mean the attainment of capabilities affording a level of economic productivity and control over nature
      close to the maximum that could feasibly be achieved (in the fullness of time) (Bostrom, 2013)." That phrasing a bit vague, but I think that attaining that level of capabilities doesn't mean that we've actually got all possible technologies or that every given individual has the maximum possible capabilities.
    • It seems plausible/likely that some technologies are sufficiently dangerous that we'll only be safe if we're in a world where them ever being discovered or ever being deployed is prevented - i.e., that no protective measure would be adequate except prevention.
      • iirc, Bostrom's discussion of "Type-0 vulnerabilities" is relevant here.
  • I think the following bolded claim is false, and I think it's very weird to make this empirical claim without providing any actual evidence for it: "AI safety researchers argue over the feasibility of ‘boxing’ AIs in virtual environments, or restricting them to act as oracles only, but they all agree that training an AI with access to 80+% of all human sense-data and connecting it with the infrastructure to call out armed soldiers to kill or imprison anyone perceived as dangerous would be a disaster."
    • I am 100% confident that not all AI safety researchers have even considered that question, let alone formed the stance you suggest they all agree on.
    • Perhaps you meant they "would all agree"? Still though, it would seem odd to be confident of that without providing any justification.
    • And I think in fact many would disagree if asked. In fact, I expect that many of them would believe that what the future should look like would technically or basically involve this happening; we have a properly aligned superintelligent AI that either already has access to those things or could gain access to those things if it simply chose to do so.
  • I think "If it is to fulfill its mission of preventing anthropogenic risk long into the future, the global surveillance state cannot afford to risk usurpation" and related claims are basically false or misleading.
    • It appears to me that we're fairly likely to be in or soon be in a "time of perils", where existential risk is unusually high. There are various reasons to expect this to subside in future besides a global surveillance state. So it seems pretty plausible that it would be best to temporarily have unusually strong/pervasive surveillance, enforcement, etc. for particular types of activities.
    • And if we've set this actor up properly, then it should be focused on what's net positive overall and should not conflate "ensuring this actor has an extremely high chance of maintaining power helps reduce some risks" with "ensuring this actor has an extremely high chance of maintaining power is overall net beneficial".
    • To be clear, I'm not saying that we should do things like this or that it'd work if we tried; I'm just saying that thinking that increased surveillance, enforcement, moves towards global governance, etc. would be good doesn't require thinking that permanent extreme levels (centralised in a single state-like entity) would be good.
  • The following seems like a misrepresentation of Bostrom, and one which is in line with what I perceive as a general one-sidedness or uncharitability or in this post: "Bostrom continues to assume that the power to take a socially beneficial action is sufficient to guarantee that the state will actually do it. “States have frequently failed to solve easier collective action problems … With effective global governance, however, the solution becomes trivial: simply prohibit all states from wielding the black-ball technology destructively.”"
    • That quote does not state that the power to take a socially beneficial action is sufficient to guarantee that a state will actually take it. A solution can be trivial but not taken.
      • Also, the "effective" in "effective global governance" might be adding something beyond "power" along the lines of "this governance is pointed in the right direction"?
    • I haven't read the VWH paper in a while, so maybe he does make this claim elsewhere, or maybe he repeatedly implies it without stating it. But that quote does not seem to demonstrate this.

Some other things I want to make sure I say (not issues with the post):

  • To be clear, I do think it's valuable to critically discuss & red-team the VWH paper in particular and also other ideas and writings that are prominent within longtermism. And I personally wish Bostrom had written the VWH paper somewhat differently, and I don't feel confident that the interventions it discusses are net positive. So this comment is not meant to discourage other critical discussions or to strongly defend the interventions discussed in VWH.
  • But I do think it's important to counter mistaken and misleading posts in general, even if the posts are good-faith and are attempting to play a valuable role of criticizing prominent ideas.
  • I wrote this comment pretty quickly, so I don't fully justify things and my tone is sometimes a bit sharp or uncharitable - apologies in advance for that.
    • (I expect that if the original poster and I instead had a call we would get on the same page faster and feel more positively toward each other, and that I would come across as a bit less rude than this comment might.)
  • I do think there are some good elements of this post (e.g., the writing is generally clear, you include a decent summary at the start, you keep things organized nicely with headings, and some of your points seem true and important). I focus on the negatives since they seem more important and due to time constraints.
  • As a heads up, I'm unlikely to reply to replies to this, since I'm trying to focus on my main work atm.

*To be clear, I'm a fan of red-teaming, which is not neutral surveying but rather deliberately critical. But that should then be framed explicitly as red-teaming. 

Thank you for reading and for your detailed comment. In general I would agree that my post is not a neutral survey of the VWH but a critical response, and I think I made that clear in the introduction even if I did not call it red-teaming explicitly. 

I'd like to respond to some of the points you make.

  1. "As Zach mentioned, I think you at least somewhat overstate the extent to which Bostrom is recommending as opposed to analyzing these interventions."

    I think this is overall unclear in Bostrom's paper, but he does have a section called Policy Implications right at the top of the paper where he says "In order for civilization to have a general capacity to deal with “black ball” inventions of this type, it would need a system of ubiquitous real-time worldwide surveillance. In some scenarios, such a system would need to be in place before the technology is invented." I think it is confusing because he starts out analyzing the urn of technology, then conditioned on there being black balls in the urn he recommends ubiquitous real-time worldwide surveillance, and then the 'high-tech panopticon' example is just one possible incarnation of that surveillance that he is analyzing. I think it is hard to deny that he is recommending the panopticon if  existential risk prevention is the only value we're measuring. He doesn't claim all-things-considered support, but my response isn't about other considerations of a panopticon. I don't think a panopticon is any good even if existential risk is all we care about. 
  2.  "You seem to argue (or at least give the vibe that) that there's there's so little value in trying to steer technological development for the better than we should mostly not bother and instead just charge ahead as fast as possible. "

    I think this is true insofar as it goes, but you miss what is in my opinion the more important second part of the argument. Predicting the benefits of future tech is very difficult, but even if we knew all of that, getting the government to actually steer in the right direction is harder. For example, economists have known for centuries that domestic farming subsidies are inefficient. They are wasteful and they produce big negative externalities. But almost every country on earth has big domestic farming subsidies because they benefit a small, politically active group in most countries. I admit that we have some foreknowledge of which technologies look dangerous and which do not. That is far from sufficient for using the government to decrease risk. 

    The point of Enlightenment Values is not that no one should think about the risks of technology and we should all charge blindly forward. Rather, it is that decisions about how best to steer technology for the better can and should be made on the individual level where they are more voluntary, constrained by competition, and mistakes are hedged by lots of other people making different decisions. 
  3. "A core premise/argument in your post appears to be that pulling a black ball and an antidote (i.e., discovering a very dangerous technology and a technology that can protect us from it) at the same time means we're safe. This seems false, and I think that substantially undermines the case for trying to rush forward and grab balls from the urn as fast as possible."

    There are technologies like engineered viruses and vaccines, but how they interact depends much more on their relative costs. An antidote to $5-per-infection viruses might need to be $1-per-dose vaccines or $0.5-per-mask PPE. If you just define an antidote to be "a technology which is powerful and cheap enough to counter the black ball should they be pulled simultaneously" then the premise stands.
  4. "Do you (the reader) feel confident that everything will go well in that world where all possible techs and insights on dumped on us at once?"

    Until meta-understanding of technology greatly improves this is ultimately a matter of opinion. If you think there exists some technology that is incompatible with civilization in all contexts then I can't really prove you wrong but it doesn't seem right to me. 

    Type-0 vulnerabilities were 'surprising strangelets.' Not techs that are incompatible with civilization in all contexts, but risks that come from unexpected phenomena like the Hadron Collider opening a black hole or something like that. 
  5. "I think the following bolded claim is false, and I think it's very weird to make this empirical claim without providing any actual evidence for it: "AI safety researchers argue over the feasibility of ‘boxing’ AIs in virtual environments, or restricting them to act as oracles only, but they all agree that training an AI with access to 80+% of all human sense-data and connecting it with the infrastructure to call out armed soldiers to kill or imprison anyone perceived as dangerous would be a disaster."

    You're right that I didn't get any survey of AI researchers for this question. The near-tautological nature of "properly aligned superintelligence" guarantees that if we had it, everything would go well. So yeah, probably lots of AI researchers would agree that a properly aligned superintelligence would use surveillance to improve the world. This is a pretty empty statement imo. The question is about what we should do next. This hypothetical aligned intelligence tells us nothing about what increasing state AI surveillance capacity does on the margin. Note that Bostrom is not recommending that an aligned superintelligent-being do the surveillance. His recommendations are about increasing global governance and surveillance on the margin. The AI he mentions is just a machine learning classifier that can help a human government blur out the private parts the cameras collect. 
  6. "I'm just saying that thinking that increased surveillance, enforcement, moves towards global governance, etc. would be good doesn't require thinking that permanent extreme levels (centralised in a single state-like entity) would be good."

    This is only true if you have a reliable way of taking back increased surveillance, enforcement, and moves towards global governance. The alignment and instrumental convergence problems I outlined in those sections give strong reasons why these capabilities are extremely difficult to take back. Bostrom scantly mentions the issue of getting governments to enact his risk reducing policies once they have the power to enforce them, let alone give a mechanism design which would judiciously use its power to guide us through the time of perils and then reliably step down. Without such a plan the issues of power-seeking and misalignment are not ones you can ignore

Thank you for writing this! I am strongly against authoritarianism, and I think liberalism is important for human welfare outside of its implications for existential risk. I appreciate that this post articulates in detail why we shouldn't give up on liberalism even in the face of existential threats from emerging technologies. The point about global police states (even ostensibly benevolent ones) having instrumentally convergent goals is also insightful and I think it would resonate with many people here.

Thank you for reading! I definitely agree that liberalism has tons of other important qualities. I wanted to make an argument solely with the language of existential risk though for two reasons:

  1. Thinking on existential risk tends to be totalizing, perhaps fairly. The humanistic values of liberalism are really hard to weigh up against all potential future value. Most people who think about this stuff would just dismiss this out of hand for that reason
  2. Much else has been written on the other values of liberalism and I think that most EAs at least intuitively agree that liberalism is very valuable.

Related, John von Neumann on x-risk:

Finally and, I believe, most importantly, prohibition of technology (invention and development, which are hardly separable from underlying scientific inquiry), is contrary to the whole ethos of the industrial age. It is irreconcilable with a major mode of intellectuality as our age understands it. It is hard to imagine such a restraint successfully imposed in our civilization. Only if those disasters that we fear had already occurred, only if humanity were already completely disillusioned about technological civilization, could such a step be taken. But not even the disasters of recent wars have produced that degree of disillusionment, as is proved by the phenomenal resiliency with which the industrial way of life recovered even—or particularly—in the worst-hit areas. The technological system retains enormous vitality, probably more than ever before, and the counsel of restraint is unlikely to be heeded.

What safeguard remains? Apparently only day-to-day — or perhaps year-to-year — opportunistic measures, along sequence of small, correct decisions. [...] Under present conditions it is unreasonable to expect a novel cure-all. For progress there is no cure. Any attempt to find automatically safe channels for the present explosive variety of progress must lead to frustration. The only safety possible is relative, and it lies in an intelligent exercise of day-to-day judgment.

Yes, this paper is great and it was an inspiration for my piece. I found his answer here pretty unsatisfying though so hopefully I was able to expand on it well.

Given the draconian control that this state imposes on everyone, it seems likely that at least 15% of the population would strongly resent and resist this government. Executing dissenters could easily be a catastrophic risk in itself. Beyond directly killing or imprisoning anyone who might try to disobey the state, the government will likely find that scapegoating certain groups is a good way to justify their power, shift blame for stagnant or deteriorating economic conditions, and maintain stability.

My assumption - although I haven't read the paper recently - was that this would only be implemented if the state had access to immensely powerful AI technologies in which case there would be no sense in executing dissenters because they wouldn't pose any threat at all. Similarly, at this point the state wouldn't have any need to scrapegoat any particular group as their power would be absolute and essentially unchallengeable and economic conditions would essentially be irrelevant given the abundance that would result.

I feel slightly bad saying this, but I downvoted your post because I felt that it was likely to cause people to misunderstand Bostrom's position and I don't think we want to encourage criticism that confuses people on the position that someone holds. At the same time, I appreciated all the hard work you put into this, so I felt it was a shame that I couldn't upvote it. (I also feel slightly nervous as I haven't read the paper recently, so maybe I'm actually the one misunderstanding him, which would be rather embarrassing).

Bostrom may have talked about this elsewhere since I've heard other people say this, but he doesn't make this point in the paper. He only mentions AI briefly as a tool the panopticon government could use to analyze the video and audio coming in from their surveillance. He also says:

"Being even further removed from individuals and culturally cohesive ‘peoples’ than are typical state governments, such an institution might by some be perceived as less legitimate, and it may be more susceptible to agency problems such as bureaucratic sclerosis or political drift away from the public interest."

He also considers what might be required for a global state to bring other world governments to heel. So I don't think he is assuming that the state can completely ignore all dissent or resistance because it FOOMs into an all powerful AI. 

Either way I think that is a really bad argument. It's basically just saying "if we had aligned superintelligence running the world everything would be fine" which is almost tautologically true. But what are we supposed to conclude from that? I don't think that tells us anything about increasing state power on the margin. Also, aligning the interests of powerful AI with a powerful global state is not sufficient for alignment of AI with humanity more generally. Powerful global states are not very well aligned with the interests of their constituents. 

My reading is that Bostrom is making arguments about how human governance would need to change to address risks from some types of technology. The arguments aren't explicitly contingent on any AI technology that isn't available today. 

Great post, Maxwell! Strongly upvoted. What would be your top recommendations for someone interested in learning more about your worldview / progress studies?

political liberty, technological progress, and political liberty ⇒ technological progress

Are the repetitions of political liberty and technological progress intentional?

Hey Max, great post! 

Let me pitch an idea to you:
If Bostrom's VWH were correct and eventually our picking of new balls from the urn will lead to our destruction, perhaps a solution is to split humanity into many sets of pickers, isolated from one another's knowledge, in order to maximize the total amount of time humanity can experience the picking process for. One way to do this might be to colonize other planets in complete secrecy, provide as little technology as possible to the new inhabitants of those planets, and then destroy the knowledge that allowed us to do so, severing the connection between these worlds. They would get to experience their own set of discoveries, independent of our set, with both groups drawing white and black balls separately from each other. If we could also imbue them with the idea that this is the right course of action, perhaps they would also create more isolated sets of pickers, and know not to seek out those who had created them. 
What do you think?

This might be the best strategy if we're all eventually doomed. Although it might turn out that the tech required to colonize planets comes after a bunch of black balls. At least like nuclear rockets and some bio-tech stuff seems likely. 

Even Bostrom doesn't think we're inevitably doomed though. He just thinks that global government is the only escape hatch. 

Max and Sharmake, note that Bostrom does not claim in this piece (or anywhere, as far as I know) that the vulnerable world hypothesis is true. So "global government is the only escape hatch" isn't really his position. (Also note that we could have strong domain-specific global governance without a global government.)

Which is essentially in his perspective, we are utterly doomed barring technological stagnation.

We can't win: Either we get a bang x-risk from individuals, or we get a stable x-risk from states.

Not just surveillance and policing need to be near absolute, you'd need to be able to reliably control minds to the point where a opposite perspective can be totally squashed.

I of course disagree with Bostrom wildly on this.

Curated and popular this week
Relevant opportunities