Informational hazards and the cost-effectiveness of open discussion of catastrophic risks

turchin

TL;DR: In order to prevent x-risks, our strategic vision should outperform technical capabilities of the potential malevolent agents, which means that strategic discussion should be public and open, but the publication of technical dangerous knowledge should be prevented.

Risks and benefits of the open discussion

Bostrom has created a typology of info-hazards, but any information could also have “x-risk prevention positive impact”, or info-benefits. Obviously, info-benefits must outweigh the info-hazards of the open public discussion of x-risks, or the research of the x-risks is useless. In other words, the “cost-effectiveness” of the open discussion of a risk A should be estimated, and the potential increase in catastrophic probability should be weighed against a possible decrease of the probability of a catastrophe.

The benefits of public discussion are rather obvious: If we publicly discuss a catastrophic risk, we can raise awareness about it, prepare for the risk, and increase funding for its prevention. Publicly discussed risks are also more likely to be viewed by a larger group of scientists than those that are discussed by some closed group. Interdisciplinary research and comparison of different risks is impossible if they are secrets of different groups; as a result of this phenomenon, for example, asteroid risks are overestimated (more publicly discussed) and biorisks are underestimated (less-discussed).

A blanket "information hazard counterargument" is too general, as any significant new research on x-risks changes the information landscape. For example, even good news about new prevention methods may be used by bad actors to overcome these methods.

The problem of informational hazards has already been explored in the field of computer security, and they have developed best practices in this area. The protocol calls for the discoverer of an informational hazard to first try to contact the firm which owns the vulnerable software, and later, if there is no reply, to publish it openly (or at least hint at it), to provide users with an advantage over bad actors who may use it secretly.

The relative power of info-hazards depends on the information that has already been published and on other circumstances

Consideration 1. If something is already public knowledge, then discussing it is not an informational hazard. Example: AI risk. The same is true for the “attention hazard”: if something is extensively present now in the public filed, it is less dangerous to discuss it publicly.

Consideration 2: If information X is public knowledge, then similar information X2 is a lower informational hazard. Example: if the genome of a flu virus N1 has been published, publishing similar flu genome N2 has a marginal informational hazard.

Consideration 3. If many info-hazards have already been openly published, the world may be considered saturated with info-hazards, as a malevolent agent already has access to so much dangerous information. In our world, where genomes of the pandemic flus have been openly published, it is difficult to make the situation worse.

Consideration 4: If I have an idea about x-risks in a field in which I don’t have technical expertise and it only took me one day to develop this idea, such an idea is probably obvious to those with technical expertise, and most likely regarded by them as trivial or non-threatening.

Consideration 5: A layman with access only to information available on Wikipedia is not able to generate ideas about really powerful informational hazards, that could not already be created by a dedicated malevolent agent, like the secret service of a rogue country. However, if one has access to unique information, which is typically not available to laymen, this could be an informational hazard.

Consideration 6. Some ideas will be suggested anyway soon, but by speakers who are less interested in risk prevention. For example, ideas similar to Roko’s Basilisk have been suggested independently twice by my friends.

Consideration 7: Suppressing some kinds of information may signal its importance to malevolent agents, producing a “Streisand effect”.

Consideration 8: If there is a secret net to discuss such risks, some people will be excluded from it, which may create undesired social dynamics.

Info-hazard classification

There are several types of catastrophic informational hazard (a more detailed classification can be seen in Bostrom’s article, which covers not only catastrophic info-hazards, but all possible ones):

* Technical dangerous information (genome of a virus),

* Ideas about possible future risks (like deflection of asteroids to Earth)

* Value-related informational hazards (e.g. idea voluntary human extinction movement or fighting overpopulation by creation small catastrophes).

* Attention-related info-hazards – they are closely related to value-hazard, as the more attention gets the idea, the more value humans typically gives to it. the potentially dangerous idea should be discussed in the ways where it more likely attracts the attention of the specialists than public or potentially dangerous agents. This includes special forums, jargon, non-sensational titles, and scientific fora for discussion.

The most dangerous are value-related and technical information. Value-related information could work as a self-replicating meme, and technical information could be used to actually create dangerous weapons, while ideas about possible risks could help us to prepare for such risks or start additional research.

Value of public discussion

We could use human extinction probability change as the only important measure of the effectiveness of any action according to Bostrom’s maxipoc. In that case, the utility of any public statement A is:

V = ∆I(increase of survival probability via better preparedness) – ∆IH(increase of the probability of the x-risk because the bad actors will know it).

Emerging technologies increase the complexity of the future, which in some moment could become chaotic. The more chaotic is the future, the shorter is planning horizon, and the less time we have to act preventively. We need a full picture of the future risks for strategic planning. To have the full picture, we have to openly discuss the risks without going in the technical details.

The reason for it is that we can’t prevent risks which we don’t know, and the prevention strategy should have full list of risks, while malevolent agent may need only technical knowledge of one risk (and such knowledge is already available in the field of biotech, so malevolent agents can’t gain much from our lists).

Conclusion

Society could benefit from the open discussion of possible risks ideas as such discussion could help in the development of general prevention measures, increasing awareness, funding and cooperation. This could also help us to choose priorities in fighting different global risks.

For example, biorisks are less-discussed and thus could be perceived as being less of a threat than the risks of AI. However, biorisks could exterminate humanity before the emergence of superintelligent AI (to prove this argument I would have to present general information which may be regarded as having informational hazard). But the amount of technical hazardous information openly published is much larger in the field of biorisks – exactly because the risk of the field as a whole is underestimated!

If you have a new idea which may appear to be a potential info-hazard, you may need to search the internet to find out if it has already been published – most likely, it is. Then you may privately discuss it with a respected scientist in the field, who also has knowledge of catastrophic risks and ask if the scientist thinks that this idea is really dangerous. The attention hazard should be overcome by non-sensationalist and non-media-attracting methods of analysis.

It is a best practice to add to the description of any info-hazard the ways in which the risk could be overcome, or why the discussion could be used to find approaches for its mitigation.

Literature:

Bostrom “Information Hazards: A Typology of Potential Harms from Knowledge”, 2011. https://nickbostrom.com/information-hazards.pdf

Yampolsky “BEYOND MAD?: THE RACE FOR ARTIFICIAL GENERAL INTELLIGENCE”

https://www.itu.int/en/journal/001/Documents/itu2018-9.pdf

https://wiki.lesswrong.com/wiki/Information_hazard

5 Reactions

Mentioned in

41We summarized the top info hazard articles and made a prioritized reading list

More posts like this

Comments21

Sorted by

New & upvoted

Click to highlight new comments since: Today at 2:54 PM

Gregory Lewis🔸Jul 3 201814

Thanks for writing this. How best to manage hazardous information is fraught, and although I have some work in draft and under review, much remains unclear - as you say, almost anything could have some some downside risk, and never discussing anything seems a poor approach.

Yet I strongly disagree with the conclusion that the default should be to discuss potentially hazardous (but non-technical) information publicly, and I think your proposals of how to manage these dangers (e.g. talk to one scientist first) generally err too lax. I provide the substance of this disagreement in a child comment.

I’d strongly endorse a heuristic along the lines of, “Try to avoid coming up with (and don’t publish) things which are novel and potentially dangerous”, with the standard of novelty being a relatively uninformed bad actor rather than an expert (e.g. highlighting/elaborating something dangerous which can be found buried in the scientific literature should be avoided).

This expressly includes more general information as well as particular technical points (e.g. “No one seems to be talking about technology X, but here’s why it has really dangerous misuse potential” would ‘count’, even if a particular ‘worked example’ wasn’t included).

I agree it would be good to have direct channels of communication for people considering things like this to get advice on whether projects they have in mind are wise to pursue, and to communicate concerns they have without feeling they need to resort to internet broadcast (cf. Jan Kulveit’s remark).

To these ends, people with concerns/questions of this nature are warmly welcomed and encouraged to contact me to arrange further discussion.

Gregory Lewis🔸Jul 3 20187

0: We agree potentially hazardous information should only be disclosed (or potentially discovered) when the benefits of disclosure (or discovery) outweigh the downsides. Heuristics can make principles concrete, and a rule of thumb I try to follow is to have a clear objective in mind for gathering or disclosing such information (and being wary of vague justifications like ‘improving background knowledge’ or ‘better epistemic commons’) and incur the least possible information hazard in achieving this.

A further heuristic which seems right to me is one should disclose information in the way that maximally disadvantages bad actors versus good ones. There are a wide spectrum of approaches that could be taken that lie between ‘try to forget about it’, and ‘broadcast publicly’, and I think one of the intermediate options is often best.

1: I disagree with many of the considerations which push towards more open disclosure and discussion.

1.1: I don’t think we should be confident there is little downside in disclosing dangers a sophisticated bad actor would likely rediscover themselves. Not all plausible bad actors are sophisticated: a typical criminal or terrorist is no mastermind, and so may not make (to us) relatively straightforward insights, but could still ‘pick them up’ from elsewhere.

1.2: Although a big fan of epistemic modesty (and generally a detractor of ‘EA exceptionalism’), EAs do have an impressive track record in coming up with novel and important ideas. So there is some chance of coming up with something novel and dangerous even without exceptional effort.

1.3: I emphatically disagree we are at ‘infohazard saturation’ where the situation re. Infohazards ‘can’t get any worse’. I also find it unfathomable ever being confident enough in this claim to base strategy upon its assumption (cf. eukaryote’s comment).

1.4: There are some benefits to getting out ‘in front’ of more reckless disclosure by someone else. Yet in cases where one wouldn’t want to disclose it oneself, delaying the downsides of wide disclosure as long as possible seems usually more important, and so rules against bringing this to an end by disclosing yourself save in (rare) cases one knows disclosure is imminent rather than merely possible.

2: I don’t think there’s a neat distinction between ‘technical dangerous information’ and ‘broader ideas about possible risks’, with the latter being generally safe to publicise and discuss.

2.1: It seems easy to imagine cases where the general idea comprises most of the danger. The conceptual step to a ‘key insight’ of how something could be dangerously misused ‘in principle’ might be much harder to make than subsequent steps from this insight to realising this danger ‘in practice’. In such cases the insight is the key bottleneck for bad actors traversing the risk pipeline, and so comprises a major information hazard.

2.2: For similar reasons, highlighting a neglected-by-public-discussion part of the risk landscape where one suspects information hazards lie has a considerable downside, as increased attention could prompt investigation which brings these currently dormant hazards to light.

3: Even if I take the downside risks as weightier than you, one still needs to weigh these against the benefits. I take the benefit of ‘general (or public) disclosure’ to have little marginal benefit above more limited disclosure targeted to key stakeholders. As the latter approach greatly reduces the downside risks, this is usually the better strategy by the lights of cost/benefit. At least trying targeted disclosure first seems a robustly better strategy than skipping straight to public discussion (cf.).

3.1: In bio (and I think elsewhere) the set of people who are relevant setting strategy and otherwise contributing to reducing a given risk is usually small and known (e.g. particular academics, parts of the government, civil society, and so on). A particular scientist unwittingly performing research with misuse potential might need to know the risks of their work (likewise some relevant policy and security stakeholders), but the added upside to illustrating these risks in the scientific literature is limited (and the added downsides much greater). The upside of discussing them in the popular/generalist literature (including EA literature not narrowly targeted at those working on biorisk) is limited still further.

3.2: Information also informs decisions around how to weigh causes relative to one another. Yet less-hazardous information (e.g. the basic motivation given here or here, and you could throw in social epistemic steers from the prevailing views of EA ‘cognoscenti’) is sufficient for most decisions and decision-makers. The cases where this nonetheless might be ‘worth it’ (e.g. you are a decision maker allocating a large pool of human or monetary capital between cause areas) are few and so targeted disclosure (similar to 3.1 above) looks better.

3.3: Beyond the direct cost of potentially giving bad actors good ideas, the benefits of more public discussion may not be very high. There are many ways public discussion could be counter-productive (e.g. alarmism, ill-advised remarks poisoning our relationship with scientific groups, etc.). I’d suggest the examples of cryonics, AI safety, GMOs and other lowlights of public communication of policy and science are relevant cautionary examples.

4: I also want to supply other more general considerations which point towards a very high degree caution:

4.1: In addition to the considerations around the unilateralist’s curse offered by Brian Wang (I have written a bit about this in the context of biotechnology here) there is also an asymmetry in the sense that it is much easier to disclose previously-secret information than make previously-disclosed information secret. The irreversibility of disclosure warrants further caution in cases of uncertainty like this.

4.2: I take the examples of analogous fields to also support great caution. As you note, there is a norm in computer security of ‘don’t publicise a vulnerability until there’s a fix in place’, and initially informing a responsible party to give them the opportunity to to do this pre-publication. Applied to bio, this suggests targeted disclosure to those best placed to mitigate the information hazard, rather than public discussion in the hopes of prompting a fix to be produced. (Not to mention a ‘fix’ in this area might prove much more challenging than pushing a software update.)

4.3: More distantly, adversarial work (e.g. red-teaming exercises) is usually done by professionals, with a concrete decision-relevant objective in mind, with exceptional care paid to operational security, and their results are seldom made publicly available. This is for exercises which generate information hazards for a particular group or organisation - similar or greater caution should apply to exercises that one anticipates could generate information hazardous for everyone.

4.4: Even more distantly, norms of intellectual openness are used more in some areas, and much less in others (compare the research performed in academia to security services). In areas like bio, the fact that a significant proportion of the risk arises from deliberate misuse by malicious actors means security services seem to provide the closer analogy, and ‘public/open discussion’ is seldom found desirable in these contexts.

5: In my work, I try to approach potentially hazardous areas as obliquely as possible, more along the lines of general considerations of the risk landscape or from the perspective of safety-enhancing technologies and countermeasures. I do basically no ‘red-teamy’ types of research (e.g. brainstorm the nastiest things I can think of, figure out the ‘best’ ways of defeating existing protections, etc.)

(Concretely, this would comprise asking questions like, “How are disease surveillance systems forecast to improve over the medium term, and are there any robustly beneficial characteristics for preventing high-consequence events that can be pushed for?” or “Are there relevant limits which give insight to whether surveillance will be a key plank of the ‘next-gen biosecurity’ portfolio?”, and not things like, “What are the most effective approaches to make pathogen X maximally damaging yet minimally detectable?”)

I expect a non-professional doing more red-teamy work would generate less upside (e.g. less well networked to people who may be in a position to mitigate vulnerabilities they discover, less likely to unwittingly duplicate work) and more downside (e.g. less experience with trying to manage info-hazards well) than I. Given I think this work is usually a bad idea for me to do, I think it’s definitely a bad idea for non-professionals to try.

I therefore hope people working independently on this topic approach ‘object level’ work here with similar aversion to more ‘red-teamy’ stuff, or instead focus on improving their capital by gaining credentials/experience/etc. (this has other benefits: a lot of the best levers in biorisk are working with/alongside existing stakeholders rather than striking out on one’s own, and it’s hard to get a role without (e.g.) graduate training in a relevant field). I hope to produce a list of self-contained projects to help direct laudable ‘EA energy’ to the best ends.

WillPearsonJul 4 20180

Hi Gregory,

A couple of musings generated by your comment.

2: I don’t think there’s a neat distinction between ‘technical dangerous information’ and ‘broader ideas about possible risks’, with the latter being generally safe to publicise and discuss.

I have this idea of independent infrastructure, trying to make infrastructure (electricity/water/food/computing) that is on a smaller scale than current infrastructure. This is for a number of reasons, one of which includes mitigating risks, How should I build broad-scale support for my ideas without talking about the risks I am mitigating?

4.1: In addition to the considerations around the unilateralist’s curse offered by Brian Wang (I have written a bit about this in the context of biotechnology here) there is also an asymmetry in the sense that it is much easier to disclose previously-secret information than make previously-disclosed information secret. The irreversibility of disclosure warrants further caution in cases of uncertainty like this.

Although in some scenarios non-disclosure is irreversible as well, as conditions change. Consider if someone had the idea of hacking a computer and had managed to convince the designers of C to create a more secure list indexing and also everyone not to use other insecure languages. Now we would not be fighting the network effect of all the bad C code when trying to get people to code computers securely.

This irreversibility of non-disclosure seems to only occur if if something is not a huge threat right now, but may become more so as technology develops and gets more widely used and locked in. Not really relevant to the biotech arena. that I can think of immediately at least. But an interesting scenario nonetheless.

Brian WangJun 23 201813

The relevance of unilateralist's curse dynamics to info hazards is important and worth mentioning here. Even if you independently do a thorough analysis and decide that the info-benefits outweigh the info-hazards of publishing a particular piece of information, that shouldn't be considered sufficient to justify publication. At the very least, you should privately discuss with several others and see if you can reach a consensus.

WillPearsonJun 23 20180

The unilateralists curse only applies if you expect other people to have the same information as you right?

You can figure out if they have the same information as you to see if they are concerned about the same things you are. By looking at the mitigation's people are attempting. Altruists should be attempting mitigations in a unilateralist's curse position, because they should expect someone less cautious than them to unleash the information. Or they want to unleash the information themselves and are mitigating the downsides until they think it is safe.

At the very least, you should privately discuss with several others and see if you can reach a consensus.

I've not had the best luck reaching out to talk to people about my ideas. I expect that the majority of new ideas will come from people not heavily inside the group and thus less influenced by group think. So you might want to think of solutions that take that into consideration.

Brian WangJun 24 20183

The unilateralists curse only applies if you expect other people to have the same information as you right?

My understanding is that it applies regardless of whether or not you expect others to have the same information. All it requires is a number of actors making independent decisions, with randomly distributed error, with a unilaterally made decision having potentially negative consequences for all.

You can figure out if they have the same information as you to see if they are concerned about the same things you are. By looking at the mitigation's people are attempting. Altruists should be attempting mitigations in a unilateralist's curse position, because they should expect someone less cautious than them to unleash the information. Or they want to unleash the information themselves and are mitigating the downsides until they think it is safe.

I agree that having dangerous information released by those who are in a position to mitigate the risks is better than having a careless actor releasing that same information –– but I disagree that this is sufficient reason to preemptively release dangerous information. I think a world where everyone follows the logic of "other people are going to release this information anyway but less carefully, so I might as well release it first" is suboptimal compared to a world where everyone follows a norm of reaching consensus before releasing potentially dangerous information. And there are reasons to believe that this latter world isn't a pipe dream; after all, generally when we're thinking about info hazards, those who have access to the potentially dangerous information generally aren't malicious actors, but rather a finite number of, e.g., biology researchers (for biorisks) who could be receptive to establishing norms of consensus.

I'm also not sure how the strategy of "preemptively release, but mitigate" would work in practice. Does this mean release potentially dangerous information, but with the most dangerous parts redacted? Release with lots of safety caveats inserted? How does this preclude the further release of the unmitigated info?

I've not had the best luck reaching out to talk to people about my ideas. I expect that the majority of new ideas will come from people not heavily inside the group and thus less influenced by group think. So you might want to think of solutions that take that into consideration.

I'm not sure I'm fully understanding you here. If you're saying that the majority of potentially dangerous ideas will originate in those who don't know what the unilateralist's curse is, then I agree –– but I think this is just all the more reason to try to spread norms of consensus.

WillPearsonJun 24 20182

My understanding is that it applies regardless of whether or not you expect others to have the same information. All it requires is a number of actors making independent decisions, with randomly distributed error, with a unilaterally made decision having potentially negative consequences for all.

Information determines the decisions that can be made. For example you can't spread the knowledge of how to create effective nuclear fusion without the information on how to make it.

If there is a single person with the knowledge of how to create safe efficient nuclear fusion they cannot expect other people to release it on their behalf. They may expect it to be net positive but they also expect some downsides and are unsure of whether it will be net good or not. To give a potential downside of nuclear fusion, let us say they are worried about creating excess heat over what the earth can dissipate due to widescale deployment in the world (even if it fixes global warming due to trapping solar energy, it might cause another heat related problem). I forget the technical term for this unfortunately.

The fusion expert(s) cannot expect other people to release this information for them, for as far as they know they are the only people making that exact decision.

I'm also not sure how the strategy of "preemptively release, but mitigate" would work in practice. Does this mean release potentially dangerous information, but with the most dangerous parts redacted? Release with lots of safety caveats inserted? How does this preclude the further release of the unmitigated info?

What the researcher can do is try and build consensus/lobby for a collective decision making body on the internal climate heating (ICH) problem. Planning to release the information when they are satisfied that there is going to be a solution in time for fixing the problem when it occurs.

If they find a greater than expected number of people lobbying for solutions to the ICH problem, then they can expect they are in a unilateralist's curse scenario. And they may want to hold off on releasing information even when they are satisfied with the way things are going (in case there is some other issue they have not thought of).

They can look to see what the other people are doing that have been helping with ICH and see if there other initiatives they are starting, that may or may not be to do with the advent of nuclear fusion.

I think I am also objecting to the expected payoff being thought of as a fixed quantity. You can either learn more about the world to alter your knowledge of the payoff or try and introduce things/insituttions into the world to alter the expected payoff. Building useful institutions may rely on releasing some knowledge, that is where things become more hairy.

I've not had the best luck reaching out to talk to people about my ideas. I expect that the majority of new ideas will come from people not heavily inside the group and thus less influenced by group think. So you might want to think of solutions that take that into consideration.

I'm not sure I'm fully understanding you here. If you're saying that the majority of potentially dangerous ideas will originate in those who don't know what the unilateralist's curse is, then I agree –– but I think this is just all the more reason to try to –– but I think this is just all the more reason to try to spread norms of consensus.

I was suggesting that more norm spreading should be done outwards, keeping it simple and avoiding too much jargon. Is there a presentation of the unilateralist's curse aimed at micro biologists for example?

Also as the the unilaterlist's curse suggests discussing with other people such that they can undertake the information release, sometimes increases the expectation of a bad out come. How should consensus be reached in those situations?

Increasing the number of agents capable of undertaking the initiative also exacerbates the problem: as N grows, the likelihood of someone proceeding incorrectly increases monotonically towards 1.7 The magnitude of this effect can be quite large even for relatively small number of agents. For example, with the same error assumptions as above, if the true value of the initiative V* = -1 (the initiative is undesirable), then the probability of erroneously undertaking the initiative grows rapidly with N, passing 50% for just 4 agents.

Brian WangJun 25 20180

If there is a single person with the knowledge of how to create safe efficient nuclear fusion they cannot expect other people to release it on their behalf.

Ah right. I suppose the unilateralist's curse is only a problem insofar as there are a number of other actors also capable of releasing the information; if you are a single actor then the curse doesn't really apply. Although one wrinkle might be considering the unilateralist's curse with regards to different actors through time (i.e., erring on the side of caution with the expectation that other actors in the future will gain access to and might release the information), but coordination in this case might be more challenging.

What the researcher can do is try and build consensus/lobby for a collective decision making body on the internal climate heating (ICH) problem. Planning to release the information when they are satisfied that there is going to be a solution in time for fixing the problem when it occurs.

Thanks, this concrete example definitely helps.

I think I am also objecting to the expected payoff being thought of as a fixed quantity. You can either learn more about the world to alter your knowledge of the payoff or try and introduce things/insituttions into the world to alter the expected payoff. Building useful institutions may rely on releasing some knowledge, that is where things become more hairy.

This makes sense. "Release because the expected benefit is above the expected risk" or "not release because the vice versa is true" is a bit of a false dichotomy, and you're right that we should be more thinking about options that could maximize the benefit while minimizing the risk when faced with info hazards.

Also as the the unilaterlist's curse suggests discussing with other people such that they can undertake the information release, sometimes increases the expectation of a bad out come. How should consensus be reached in those situations?

This can certainly be a problem, and is a reason not to go too public when discussing it. Probably it's best to discuss privately with a number of other trusted individuals first, who also understand the unilateralist's curse, and ideally who don't have the means/authority of releasing the information themselves (e.g., if you have a written up blog post you're thinking of posting that might contain info hazards, then maybe you could discuss in vague terms with other individuals first, without sharing the entire post with them?).

WillPearsonJun 25 20182

Ah right. I suppose the unilateralist's curse is only a problem insofar as there are a number of other actors also capable of releasing the information; if you are a single actor then the curse doesn't really apply. Although one wrinkle might be considering the unilateralist's curse with regards to different actors through time (i.e., erring on the side of caution with the expectation that other actors in the future will gain access to and might release the information), but coordination in this case might be more challenging.

Interesting idea. This may be worth trying to develop more fully?

Probably it's best to discuss privately with a number of other trusted individuals first, who also understand the unilateralist's curse,

I'm still coming at this from a lens of "actionable advice for people not in ea". It might be that the person doesn't know many other trusted individuals, what should be the advice then? It would probably also be worth giving advice on how to have the conversation as well. The original article gives some advice on what happens if consensus can't be reached (voting/such like).

As I understand it you shouldn't wait for consensus else you have the unilateralist's curse in reverse. Someone pessimistic about an intervention can block the deployment of an intervention needed to avoid disaster (this seems very possible if you consider crucial considerations flipping signs, rather than just random noise in beliefs in desirability).

Would you suggest discussion and vote (assuming no other courses of action can be agreed upon)? Do you see the need to correct for status quo bias in any way?

This seems very important to get right. I'll think about this some more.

Brian WangJun 27 20181

Interesting idea. This may be worth trying to develop more fully?

Yeah. I'll have to think about it more.

I'm still coming at this from a lens of "actionable advice for people not in ea". It might be that the person doesn't know many other trusted individuals, what should be the advice then?

Yeah, for people outside EA I think structures could be set up such that reaching consensus (or at least a majority vote) becomes a standard policy or an established norm. E.g., if a journal is considering a manuscript with potential info hazards, then perhaps it should be standard policy for this manuscript to be referred to some sort of special group consisting of journal editors from a number of different journals to deliberate. I don't think people need to be taught the mathematical modeling behind the unilateralist's curse for these kinds of policies to be set up, as I think people have an intuitive notion of "it only takes one person/group with bad judgment to fuck up the world; decisions this important really need to be discussed in a larger group."

One important distinction is that people who are facing info hazards will be in very different situations when they are within EA vs. when they are out of EA. For people within EA, I think it is much more likely to be the case that a random individual has an idea that they'd like to share in a blog post or something, which may have info hazard-y content. In these situations the advice "talk to a few trusted individuals first" seems to be appropriate.

For people outside of EA, I think those who are in possession of info hazard-y content are much more likely to be embedded in some sort of larger institution (e.g., a research scientist or a journal editor looking to publish something), where perhaps the best leverage is setting up certain policies, rather than trying to teach everyone the unilateralist's curse.

As I understand it you shouldn't wait for consensus else you have the unilateralist's curse in reverse. Someone pessimistic about an intervention can block the deployment of an intervention needed to avoid disaster.

You're right, strict consensus is the wrong prescription. A vote is probably better. I wonder if there's mathematical modeling that you could do that would determine what fraction of votes is optimal, in order to minimize the harms of the standard unilateralist's curse and the curse in reverse? Is it a majority vote? A 2/3s vote? l suspect this will depend on what the "true sign" of releasing the potentially dangerous info is likely to be; the more likely it is to be negative, the higher bar you should be expected to clear before releasing.

WillPearsonJun 27 20180

For people outside of EA, I think those who are in possession of info hazard-y content are much more likely to be embedded in some sort of larger institution (e.g., a research scientist or a journal editor looking to publish something), where perhaps the best leverage is setting up certain policies, rather than trying to teach everyone the unilateralist's curse.

There is a growing movement of maker's and citizen scientists that are working on new technologies. It might be worth targeting them somewhat (although again probably without the math). I think the approaches for ea/non-ea seem sensible.

You're right, strict consensus is the wrong prescription. A vote is probably better. I wonder if there's mathematical modeling that you could do that would determine what fraction of votes is optimal, in order to minimize the harms of the standard unilateralist's curse and the curse in reverse? Is it a majority vote? A 2/3s vote? l suspect this will depend on what the "true sign" of releasing the potentially dangerous info is likely to be; the more likely it is to be negative, the higher bar you should be expected to clear before releasing.

I also like to weigh the downside of the lack of releasing the information as well. If you don't release information you are making everyone make marginally worse decisions (if you think someone will release it anyway later). For example in the nuclear fusion example, you think that everyone currently building new nuclear fission stations are wasting their time, that people training on how to manage coal plants should be training on something else etc, etc.

I also have another consideration which is possibly more controversial. I think we need some bias to action, because it seems like we can't go on as we are for too much longer (another 1000 years might be pushing it). The level of resources and coordination towards global problems fielded by the status quo seems insufficient. So it is a default bad outcome.

With this consideration, going back to the fusion pioneers, they might try and find people to tell so that they could increase the bus factor (the number of people that would have to die to lose the knowledge). They wouldn't want the knowledge to get lost (as it would be needed in the long term) and they would want to make sure that whoever they told understood the import and potential downsides of the technology.

Edit: Knowing the sign of an intervention is hard, even after the fact. Consider the invention and spread of the knowledge about nuclear chain reactions. Without it we would probably be burning a lot more fossil fuels, however with it we have the existential risk associated with it. If that risk never pays out, then it may have been a spur towards greater coordination and peace.

I'll try and formalise these thoughts at some point, but I am bit work impaired for a while.

turchinJun 28 2018-1

One more problem with the idea that I should consult my friends first before publishing a text is a "friend' bias": people who are my friends tend to react more positively on the same text than those who are not friends. I personally had a situation when my friends told me that my text is good and non-info-hazardous, but when I presented it to people who didn't know me, their reaction was opposite.

turchinJun 26 2018-2

Sometimes, when I work on a complex problem, I feel as if I become one of the best specialists in it. Surely, I know three other people who are able to understand my logic, but one of them is dead, another is not replying on my emails and the third one has his own vision, affected by some obvious flaw. So none of them could give me correct advice about the informational hazard.

turchinJun 24 20180

I've not had the best luck reaching out to talk to people about my ideas. I expect that the majority of new ideas will come from people not heavily inside the group and thus less influenced by group think. So you might want to think of solutions that take that into consideration.

Yes, I met the same problem. The best way to find people who are interested and are able to understand the specific problem is to publish the idea openly in a place like this forum, but in that situation, hypothtical bad people also will be able to read the idea.

Also, info-hazard discussion applies only to "medium level safety reserachers", as top level ones have enough authority to decide what is the info hazard, and (bio)scientists are not reading our discussions. As result, all fight with infor hazards is applied to small and not very relevant group.

For example, I was advised not to repost the a scientific study as even reposting it would create the informational hazard in the form of attracting attention to its dangerous applications. However, I see the main problem on the fact that such scinetific research was done and openly published, and our relactance to discuss such events only lower our strategic understanding of the different risks.

OferJun 23 20187

In this FLI podcast episode, Andrew Critch suggested handling a potentially dangerous idea like a software update rollout procedure, in which the update is distributed gradually rather than to all customers at once:

... I would tell you the same thing I would tell anyone who discovers a potentially dangerous idea, which is not to write a blog post about it right away.

I would say, find three close, trusted individuals that you think reason well about human extinction risk, and ask them to think about the consequences and who to tell next. Make sure you’re fair-minded about it. Make sure that you don’t underestimate the intelligence of other people and assume that they’ll never make this prediction

...

Then do a rollout procedure. In software engineering, you developed a new feature for your software, but it could crash the whole network. It could wreck a bunch of user experiences, so you just give it to a few users and see what they think, and you slowly roll it out. I think a slow rollout procedure is the same thing you should do with any dangerous idea, any potentially dangerous idea. You might not even know the idea is dangerous. You may have developed something that only seems plausibly likely to be a civilizational scale threat, but if you zoom out and look at the world, and you imagine all the humans coming up with ideas that could be civilizational scale threats.

...

If you just think you’ve got a small chance of causing human extinction, go ahead, be a little bit worried. Tell your friends to be a little bit worried with you for like a day or three. Then expand your circle a little bit. See if they can see problems with the idea, see dangers with the idea, and slowly expand, roll out the idea into an expanding circle of responsible people until such time as it becomes clear that the idea is not dangerous, or you manage to figure out in what way it’s dangerous and what to do about it, because it’s quite hard to figure out something as complicated as how to manage a human extinction risk all by yourself or even by a team of three or maybe even ten people. You have to expand your circle of trust, but, at the same time, you can do it methodically like a software rollout, until you come up with a good plan for managing it. As for what the plan will be, I don’t know. That’s why I need you guys to do your slow rollout and figure it out.

turchinJun 23 20183

That is absolutely right, and I am always discussing ideas with friends and advanced specialist before discussing them publicly. But doing this, I discovered two obstacles:

1) If the idea is really simple, it is likely not new, but in case of a complex idea not much people are able to properly evaluate it. Maybe if Bostrom will spend a few days analysing it, he will say "yes" or "no", but typically best thinkers are very busy with their own deadlines, and will not have time to evaluate the ideas of random people. So you are limited to your closer friends, who could be biased in favour of you and ignore the info-hazard.

2) "False negatives". This is the situation when a person thinks that the idea X is not an informational hazard because it is false. However, the reasons why he thinks that the idea X is false are wrong. In that situation, the info hazard assessment is not happening.

turchinJun 25 20185

It would be great to have some kind of a committee for info-hazards assessment, like a group of trusted people who will a) will take responsibility to decide whether the idea should be published or not b) will read all incoming suggestions in timely manner с) their contacts (but may be not all the personalities) will be publicly known.

Jan_KulveitJun 27 20185

I believe this is something worth exploring. My model is that while most active people thinking about x-risks have some sort of social network links so they can ask others, there may be a long tail of people thinking in isolation, who may at some point just post something dangerous on LessWrong.

(Also there is a problem of incentives, which are often strongly in favor of publishing. You don't get much credit for not publishing dangerous ideas, if you are not allready part of some established group.)

axiomanJun 25 20182

"to prove this argument I would have to present general information which may be regarded as having informational hazard"

Is there any way to assess the credibility of statements like this (or whether this is actually an argument worth considering in a given specific context)? It seems like you could use this as a general purpose argument for almost everything.

turchinJun 25 20181

It was in fact a link on the article about how to kill everybody using multiple simultaneous pandemics - this idea may be regarded by someone as an informational hazard, but it was already suggested by some terrorists from Voluntary Human extinction movement. I also discussed with some biologists and other x-risks researchers and we concluded that it is not an infohazard. I can send you a draft.

MichaelPlantJun 25 20180

"to prove this argument I would have to present general information which may be regarded as having informational hazard"

I agree statements of this kind are very annoying, whether or not they're true.