[ Question ]

How might better collective decision-making backfire?

by Denis Drescher1 min read13th Dec 202021 comments

32

RationalityForecastingInstitutional decision-making
Frontpage

I’ve started work on a project that aims to do something like “improving our collective decision-making.”[1] Broadly, it’s meant to enable communities of people who want to make good decisions to collaboratively work out what these decisions are. Individual rationality is helpful for that but not the focus.

Concretely, it’s meant to make it easier to collaborate on probabilistic models in areas where we don’t have data. You can read more about the vision in Ozzie Gooen’s Less Wrong post on Squiggle. But please ignore this for the sake of this question. I hope to make the question and answers to it useful to a broader audience by not narrowly focusing it on the concrete thing I’m doing. Other avenues to improve collective decision-making may be improving prediction markets and developing better metrics for things that we care about.

Before I put a lot of work into this, I would like to check whether this is a good goal in the first place – ignoring tractability and opportunity costs.[2] By “good” I mean something like “robustly beneficial,” and by “robustly beneficial” I mean something like “beneficial across many plausible worldviews and futures, and morally cooperative.”

My intuition is that it’s about as robustly positive as it gets, but I feel like I could easily be wrong because I haven’t engaged with the question for a long time. Less Wrong and CFAR seem to have similar goals, though I perceive a stronger focus on individual rationality. So I like to think that people have thought and maybe written publicly about what the major risks are in what they do.

I would like to treat this question like a Stack Exchange question where you can also submit your own answers.[3] But I imagine there are many complementary answers to this question. So I’m hoping that people can add more, upvote the ones they find particularly concerning, important, or otherwise noteworthy, and refine them with comments.

For examples of pretty much precisely the type of answers I’m looking for, see my answers titled “Psychological Effects” and “The Modified Ultimatum Game.”

A less interesting answer is my answer “Legibility.” I’m less interested in it here because it describes a way in which a system can fail to attain the goal of improving decision-making rather than a way in which the successful realization of that goal backfires. I wanted to include it as a mild counterexample.

If you have ideas for further answers, it would be interesting if you could also think of ways to work around them. It’s usually not beneficial to abandon a project if there is any way in which it can backfire but to work out how the failure mode can be avoided without sacrificing all of the positive effects of the project.

You can also message me privately if you don’t want to post your answer publicly.

Acknowledgements: Thanks for feedback and ideas to Sophie Kwass and Ozzie Gooen!

(I’ve followed up on this question on my blog.)


  1. How this should be operationalized is still fairly unclear. Ozzie plans to work out more precisely what it is we seek to accomplish. You might as well call it “collaborative truth-seeking,” “improving collective epistemics,” “collaborating on better predictions,” etc. ↩︎

  2. I’ve considered a wide range of contributions I could make. Given my particular background, this currently seems to me like my top option. ↩︎

  3. My answers contain a few links. These should not be interpreted as endorsements of the linked articles or websites. ↩︎

32

New Answer
Ask Related Question
New Comment

10 Answers

Psychological Effects

Luke Muehlhauser warns that overconfidence and sunk cost fallacy may be necessary for many people to generate and sustain motivation for a project. (But note that the post is almost nine years old.) Entrepreneurs are said to be overconfident that their startup ideas will succeed. Maybe increased rationality (individual or collective) will stifle innovation.

I feel that. When I do calibration exercises, I’m only sometimes mildly overconfident in some credence intervals, and indeed, my motivation usually feels like, “Well, this is a long shot, and why am I even trying it? Oh yeah, because everything else is even less promising.” That could be better.

On a community level it may mean that any community that develops sufficiently good calibration becomes demotivated and falls apart.

Maybe there is a way of managing expectations. If you grow up in an environment where you’re exposed to greatly selection-biased news about successes, your expectations may be so high that any well-calibrated 90th percentile successes that you project may seem disappointing. But if you’re in an environment where you constantly see all the failures around you, the same level of 90th percentile success may seem motivating.

Maybe that’s also a way in which the EA community backfires. When I didn’t know about EA, I saw around me countless people who failed completely to achieve my moral goals because they didn’t care about them. The occasional exceptions seemed easy to emulate or exceed. Now I’m surrounded by people who’ve achieved things much greater than my 90th percentile hopes. So my excitement is lower even though my 90th percentile hopes are higher than they used to be.

Benefitting Unscrupulous People

A system that improves collective decision making is likely value-neutral, so it can also be used by unscrupulous agents for their nefarious ends.

Moreover unscrupulous people may benefit from it more because they have fewer moral side-constraints. If set A is the set of all ethical, legal, cooperative methods of attaining a goal, and set B is the set of all methods of attaining the same goal, then A ⊆ B. So it should always be as easy or easier to attain a goal by any means necessary than only by ethical, legal, and cooperative means.

Three silver linings:

  1. Unscrupulous people probably also have different goals from ours. Law enforcement will block them from attaining those goals, and better decision-making will hopefully not get them very far.
  2. These systems are collaborative, so you can benefit from them more the more people collaborate on them (I’m not saying monotonically, just as a rough tendency). When you invite more people into some nefarious conspiracy, then the risk that one of them blows the whistle increases rapidly. (Though it may depend on the structure of the group. There are maybe some terrorist cells who don’t worry much about whistleblowing.)
  3. If a group is headed by a narcissistic leader, the person may see a threat to their authority in a collaborative decision-making system, so that they won’t adopt it to begin with. (Though it might be that they like that collaborative systems can make it infeasible for individuals to use them to put their individual opinions to the test, so that they can silence individual dissenters. This will depend a lot on implementation details of the system.)

More speculatively, we can also promote and teach the system such that everyone who learns to use it also learns about multiverse-wide superrationality alias evidential cooperation in large worlds (ECL). Altruistic people with uncooperative agent-neutral goals will reason that they can now realize great gains from trade by being more cooperative or else lose out on them by continuing to defect.

We can alleviate the risk further by marketing the system mostly to people who run charities, social enterprises, prosocial research institutes, and democratic governments. Other people will still learn about the tools, and there are also a number of malevolent actors in those generally prosocial groups, but it may shift the power a bit toward more benevolent people. (The Benevolence, Intelligence, and Power framework may be helpful in this context.)

Finally, there is the option to make it hard to make models nonpublic. But that would have other downsides, and it’s also unlikely to be a stable equilibrium as others will just run a copy of the software on their private servers.

Convergence to best practice produces homogeneity. 

As it becomes easier to do what is likely the best option given current knowledge, fewer people try new things and so best practices advance more slowly.

For example, most organizations would benefit from applying "the basics" of good management practice. But the frontier of management is furthered by experimentation -- people trying unusual ideas that at any given point in time seem unlikely to work. 

I still see the project of improving collective decision-making as very positive on net. But if it succeeds, it becomes important to think about new ways of creating space for experimentation.

Modified Ultimatum Game

A very good example of the sort of risks that I’m referring to is based on a modified version of the ultimatum game and comes from the Soares and Fallenstein paper “Toward Idealized Decision Theory”:

Consider a simple two-player game, described by Slepnev (2011), played by a human and an agent which is capable of fully simulating the human and which acts according to the prescriptions of [Updateless Decision Theory (UDT)]. The game works as follows: each player must write down an integer between 0 and 10. If both numbers sum to 10 or less, then each player is paid according to the number that they wrote down. Otherwise, they are paid nothing. For example, if one player writes down 4 and the other 3, then the former gets paid $4 while the latter gets paid $3. But if both players write down 6, then neither player gets paid. Say the human player reasons as follows:

I don’t quite know how UDT works, but I remember hearing that it’s a very powerful predictor. So if I decide to write down 9, then it will predict this, and it will decide to write 1. Therefore, I can write down 9 without fear.

The human writes down 9, and UDT, predicting this, prescribes writing down 1.

This result is uncomfortable, in that the agent with superior predictive power “loses” to the “dumber” agent. In this scenario, it is almost as if the human’s lack of ability to predict UDT (while using correct abstract reasoning about the UDT algorithm) gives the human an “epistemic high ground” or “first mover advantage.” It seems unsatisfactory that increased predictive power can harm an agent.

A solution to this problem would have to come from the area of decision theory. It probably can’t be part of the sort of collaborative decision-making system that we envision here. Maybe there is a way to make such a problem statement inconsistent because the smarter agent would’ve committed to writing down 5 and signaled that sufficiently long in advance of the game. Ozzie also suggests that introducing randomness along the lines of the madman theory may be a solution concept.

Damaging collective decision-making

While this is not a risk for "better collective decision-making", I think that there might be some inherent risks when doing projects that attempt at improving collective decision-making on a global scale.

  1. The tool itself might be flawed in subtle ways and still be used widely. Say, the tool might work very well for situations that are characterized by one of {quantitative, high-probabilities, homogeneous group of users, only for economic questions} but not in their converse. 
  2. If the tool is recognizably flawed, that can bias people to make decisions only on where the tool works well. For example:
    1. Some criticisms of EA, and GiveWell in particular, have argued that the focus on quantitatively measurable and verifiable outcomes leads to recommending interventions that are less important than more fuzzy alternatives.
    2. Some criticisms of capitalism argue that it leads to systemic flaws because of the strong measurability inherent in financial trade.
  3. There might be better projects that could not happen if this project succeeds. 
  4. If the tool is very successful, it might lead to some form of "epistemic lock-in" where for many years there seems to be only one major way of making decisions.
  5. Perhaps our current understanding of group epistemics is misleading, and what would seem to be an improvement would actually damage decision making. 

Assorted Risks

I’ve had a chat with Justin Shovelain yesterday where we discussed a few more ways in which improved collaborative truth-seeking can backfire. These are, for me, rather early directions for further thought, so I’ll rather combine them in one answer for now.

They fall into two categories: Negative effects of better decision making and negative effects of collaborative methods.

Negative effects of better decision-making:

  1. Valuable ambiguity. It might be that ambiguity plays an important role in social interactions. There is the stylized example where it is necessary to keep the number of rounds of an iterated game a secret or else that knowledge will distort the game. I’ve also read somewhere that there’s the theory that conflicts between two countries can be exacerbated if the countries have too low-quality intelligence about each other but also if they have too high-quality intelligence about each other. But I can’t find the source, so I’m likely to misremember something. Charity evaluators also benefit from ambiguity in that fewer charities would be willing to undergo their evaluation process if the only reason why a charity would either decline it or block the results from being published were reasons that reflect badly on the charity. But there are also good and neutral reasons, so charities will always have plausible deniability.

Negative effects of collaborative methods:

  1. Centralized control. My earlier answer titled “Legibility” argued that collaborative methods will make it necessary to make considerations and values more legible than they are now so they can be communicated and quantified. This may also make them more transparent and thus susceptible to surveillance. That, in turn, may enable more powerful authoritarian governments, which may steer the world into a dystopian lock-in state.
  2. Averaging effect. Maybe there are people who are particularly inclined toward outré opinions. These people will be either unusually right or unusually wrong for their time. Maybe there’s more benefit in being unusually right than there is harm in being unusually wrong (e.g., thanks to the law). And maybe innovation toward most of what we care about is carried by unusually right people. (I’m thinking of Newton here, whose bad ideas didn’t seem to have much of an effect compared to his good ideas.) Collaborative systems likely harness – explicitly or implicitly – some sort of wisdom of the crowds type of effect. But such an effect is likely to average away the unusually wrong and the unusually right opinions. So such systems might slow progress.
  3. More power to the group. It might be that the behavior of groups (e.g., companies) is generally worse (e.g., more often antisocial) than that of individuals. Collaborative systems would shift more power from individuals to groups. So that may be undesirable.
  4. Legible values. It may be very hard to model the full complexity of moral intuitions of people. In practice, people tend toward systems that greatly reduce the dimensionality of what people typically care about. The result are utils, DALYs, SWB, consumption, life years, probability of any existential catastrophe, etc. Collaborative systems would incentivize such low-dimensional measures of value, and through training, people may actually come to care about them more. It’s contentious and a bit circular to ask whether this is good or bad. But at least it’s not clearly neutral or good.

Malevolent Empowerment

Better collective decisionmaking could lead a group to cause more harm to the world than good via entering a valley of bad decisionmaking. This presumes that humans tend to have a lot of bad effects on the world and that naively empowering humans can make those effects worse. 

e.g. a group with better epistemics and decisions could decide to take more effective action against a hated outgroup. Or it could lead to better economic & technological growth, leading to more meat eating or more CO2 production.

Humans tend to engage in the worst acts of violence when mobilized as a group that thinks it's doing something good. Therefore, helping a single group improve its epistemics and decisionmaking could make that group commit greater atrocities, or amplify negative unintended side effects.

Thank you! This seems importantly distinct from the “Benefiting Unscrupulous People” failure mode in that you put the emphasis not on intentional exploitation but on cooperation failures even among well-intentioned groups.

I’ll reuse this comment to bring up something related. The paper “Open Problems in Cooperative AI ” has a section The Potential Downsides of Cooperative AI.

The paper focuses on the cooperation aspect of the collaborative/cooperative truth-seeking, so the section on potential downsides focuses on downsides of cooperation and downsides of p... (read more)

Social Effects

Beliefs are often entangled with social signals. This can pose difficulties for what I’ll call in the following a “truth-seeking community.”

When people want to disassociate from a disreputable group – say, because they’ve really never had anything to do with the group and don’t want that to change – they can do this in two ways: They can steer clear of anything that is associated with the disreputable group or they can actively signal their difference from the disreputable group.

Things that are associated with the disreputable group are, pretty much necessarily, things that are either sufficiently specific that they rarely come up randomly or things that are common but on which the group has an unusual, distinctive stance. Otherwise these things could not serve as distinguishing markers of the group.

If the disreputable group is small, is distinguished by an unusual focus on a specific topic, and a person wants to disassociate from them, it’s usually enough to steer clear of the specific topic, and no one will assume any association. Others will start out with a prior that the person < 1% likely to be part of the group, and absent signals to the contrary, will maintain that credence.

But if the disreputable group is larger, at least in one’s social vicinity, or the group’s focal topic is a common one, then one needs to countersignal more actively. Others may start out with a prior that the person is ~ 30% likely to be part of the group and may avoid contact with them unless they see strong signals to the contrary. This is where people will find it necessary to countersignal strongly. Moreover, once there is a norm to countersignal strongly, the absence of such a signal or a cheaper signal will be doubly noticeable.

I see two, sometimes coinciding, ways along which that can become a problem. First, the disreputable group may be so because of their values, which may be extreme or uncooperative, and it is just historical contingency that they endorse some distinctive belief. Or second, the group may be disreputable because they have a distinctive belief that is so unusual as to reflect badly on their intelligence or sanity.

The first of these is particularly problematic because the belief can be any random one with any random level of likelihood, quite divorced from the extreme, uncooperative values. It might also not be so divorced, e.g., if it is one that the group can exploit to their advantage if they convince the right people of it. But the second is problematic too.

If a community of people who want to optimize their collective decision-making (let’s call it a “truth-seeking community”) builds sufficiently complex models, e.g., to determine the likelihood of intelligent life re-evolving, then maybe at some point they’ll find that one node in their model (a Squiggle program, a Bayesian network, vel sim.) would be informed by more in-depth research of a question that is usually associated with a disreputable group. They can use sensitivity analysis to estimate the cost it would have to leave the node as it is, but maybe it turns out that their estimate is quite sensitive to that node.

In the first case, in the case of a group that is disreputable by dint of their values, that is clearly a bad catch-22.

But it can also go wrong in the second case, the case of the group that is disreputable because of their unusual beliefs, because people in the truth-seeking community will usually find it impossible to assign a probability of 0 to any statement. It might be that their model is very sensitive to whether they assign 0.1% or 1% likelihood to a disreputable belief. Then there’s a social cost also in the second case: Even though their credence is low either way, the truth-seeking community will risk being associated with a disreputable group (which may assign > 90% credence to the belief), because they engage with the belief.

I see five ways in which this is problematic:

  1. Exploitation of the community by bad actors: The truth-seeking community may be socially adroit, and people will actually grant them some sort of fool’s licence because they trust their intentions. But that may turn out to be exploitable: People with bad intentions may use the guise of being truth-seeking to garner attention and support while subtly manipulating their congregation toward their uncooperative values. (Others may only be interested in the attention.) Hence such a selective fool’s licence may erode societal defenses against extreme, uncooperative values and the polarization and fragmentation of society that they entail. Meanwhile the previously truth-seeking community may be overtaken by such people, who’ll be particularly drawn to its influential positions while being unintimidated with the responsibility that comes with these positions.
  2. Exploitation of the results of the research by bad actors: The same can be exploitable in that the truth-seeking community may find that some value-neutral belief is likely to be true. Regardless of how value-neutral the belief is, the disreputable group may well be able to cunningly reframe it to exploit and weaponize it for their purposes.
  3. Isolation of and attacks on the community: Conversely, the truth-seeking community may also not be sufficiently socially adroit and still conduct their research. Other powerful actors – potential cooperation partners – will consider the above two risks or will not trust the intentions of the truth-seeking community in the first place, and so will withhold their support from the community or even attack it. This may also make it hard to attract new contributors to the community.
  4. Internal fragmentation through different opinions: The question whether the sensitivity of the model to the controversial belief is high enough to warrant any attention may be a narrow one, one that is not stated and analyzed very explicitly, or one that is analyzed explicitly but through models that make contradictory predictions. In such a case it seems very likely that people will arrive at very different predictions as to whether it’s worse to ignore the belief or to risk the previous failure modes. This can lead to fragmentation, which often leads to the demise of a community.
  5. Internal fragmentation through lack of trust: The same internal fragmentation can also be the result of decreasing trust within the community because the community is being exploited or may be exploited by bad actors along the lines of failure mode 1.
  6. Collapse of the community due to stalled recruiting: This applies when the controversial belief is treated as a serious infohazard. It’s very hard to recruit people for research without being able to tell them what research you would like them to do. This can make recruiting very or even prohibitively expensive. Meanwhile there is usually some outflow of people from any community, so if the recruitment is too slow or fully stalled, the community may eventually vanish. This would be a huge waste especially if the bulk of the research is perfectly uncontroversial.

I have only very tentative ideas of how these risks can be alleviated:

  1. The community will need to conduct an appraisal, as comprehensive and unbiased as possible, of all the expected costs/harms that come with engaging with controversial beliefs.
  2. It will need to conduct an appraisal of the sensitivity of its models to the controversial beliefs and what costs/harms can be averted, say, through more precise prioritization, if the truth value of the beliefs is better known.
  3. Usually, I think, any specific controversial belief will likely be close to irrelevant for a model so that it can be safely ignored. But when this is not the case, further safeguards can be installed:
  4. Engagement with the belief can be treated as an infohazard, so those who research it don’t do so publicly, and new people are onboarded to the research only after they’ve won the trust of the existing researchers.
  5. External communication may take the structure of a hierarchy of tests, at least in particularly hazardous cases. The researchers need to gauge the trustworthiness of a new recruit with questions that, if they backfire, afford plausible deniability and can’t do much harm. Then they only gradually increase the concreteness of the questions if they learn that the recruit is well-intentioned and sufficiently open-minded. But this can be uncooperative if some codes become known, and then people who don’t know them use them inadvertently.
  6. If the risks are mild, there may be some external communication. In it, frequent explicit acknowledgements of the risks and reassurances of the intentions of the researchers can be used to cushion the message. But these signals are cheap, so they don’t help if the risks are grave or others are already exploiting these cheap signals.
  7. Internal communication needs to frequently reinforce the intentions of the participants, especially if there are some among them who haven’t known the others for a long time, to dispel worries that some of them may practice other than prosocial, truth-seeking intentions.
  8. Agreed-upon procedures such as voting may avert some risk of internal fragmentation.

An example that comes to mind is a situation when a friend of mine complained about the lacking internal organization of certain unpolitical (or maybe left-wing) groups and contrasted it with a political party that was very well organized internally. It was an, in our circles, highly disreputable right-wing party. His statement was purely about the quality of the internal organization of the party, but I only knew that because I knew him. Strangers at that meetup might’ve increased their credence that he agrees with the policies of that party. Cushioning such a mildly hazardous statement would’ve gone a long way to reduce that risk and keep the discussion focused on value-neutral organizational practices.

Another disreputable opinion is that of Dean Radin who seems to be fairly confident that there is extrasensory perception, in particular (I think) presentiment on the timescale of 3–5 s. He is part of a community that, from my cursory engagement with it, seems to not only assign a nonzero probability to these effects and study them for expected value reasons but seems to actually be substantially certain. This entails an air of disreputability either because of the belief by itself or the particular confidence in it. If someone were to create a model to predict how likely it is that we’re in a simulation, specifically in a stored world history, they may wonder whether cross-temporal fuzziness like this presentiment may be signs of motion compensation, a technique used in video compression, which may also serve to lossily compress world histories. This sounds wild because we’re dealing with unlikely possibilities, but the simulation hypothesis, if true, may have vast effects on the distribution of impacts from interventions in the longterm. These effects may plausibly even magnify small probabilities to a point where they become relevant. Most likely, though, they stem from whatever diverse causes are behind the experimenter effect.

I imagine that history can also be a guide here as these problems are not new. I don’t know much about religion or history, so I may be mangling the facts, but Wikipedia tells me that the First Council of Nicaea in 325 CE addressed the question of whether God created Jesus from nothing (Arianism) or whether Jesus was “begotten of God,” so that there was no time when there was no Jesus because he was part of God. It culminated as follows:

The Emperor carried out his earlier statement: everybody who refused to endorse the Creed would be exiled. Arius, Theonas, and Secundus refused to adhere to the creed, and were thus exiled to Illyria, in addition to being excommunicated. The works of Arius were ordered to be confiscated and consigned to the flames, while his supporters were considered as "enemies of Christianity." Nevertheless, the controversy continued in various parts of the empire.

This also seems like a time when, at least in most parts of the empire, a truth-seeking bible scholar would’ve been well advised to consider whether the question has sufficiently vast implication as to be worth the reputational damage and threat of exile that came with engaging with it open-mindedly. But maybe there were monasteries where everyone shared a sufficiently strong bond of trust into one another’s intentions that some people had the leeway to engage with such questions.

Value lock-in

If society had better tools for coordination, it could coordinate to quash perceived value drift, such as moral circle expansion.

This is a general problem related to capabilities research, where "better" is "more capable," not necessarily "more beneficial."

Interesting. That’s a risk when pushing for greater coordination (as you said). If you keep the ability to coordinate the same and build better tools for collective decision-making, would that backfire in such a way?

I imagine collaborative tools would have to make values legible to some extent if they are to be used to analyze anything not-value-neutral. That may push toward legible values, so more like utilitarianism and less like virtue ethics or the mixed bag of moral intuitions that we usually have? But that’s perhaps a separate effect.

But I’m also very interested in improving coordination, so this risk is good to bear in mind.

2NunoSempere4moI think that when you say "better tools for collective decision-making", I'm thinking of capabilities (predictive accuracy, coordination ability, precommitment mechanisms), but you perhaps seem to be thinking of tools to generate progress related to better values. I'd be interested in seeing some examples of the later which are not of the former.
2Denis Drescher4moNo, I think that unfortunately, the tools I envision are pretty value neutral. I’m thinking of Squiggle, of Ozzie’s ideas for improving prediction markets, and of such things as using better metrics – e.g., SWB instead of QALYs, or expected value of the future instead of probability of extinction. Hmm, in my case: yes, noish, no. I think I’m really only thinking of making the decisions better, so more predictive accuracy, better Brier scores, or something like that. In the end I’m of course highly agnostic about how this will be achieved. So this only reflects how I envision this project might turn out to be characterized. Ozzie wants to work that out in more detail, so I’ll leave that to him. :-) Especially coordination ability may turn out to be affected. More de facto than de jure, I imagine, but when people wanted to collaborate on open source software, their goal was (presumably) to create better software faster and not to improve humanity’s coordination ability. But to do that, they developed version control systems and bug tracking systems, so in the end, they did improve coordination ability. So improving coordination ability is a likely externality of this sort of project, you could say. For precommitment mechanisms, I can’t think of a way this might be affected either on purpose or accidentally. Maybe it’ll be helpful to collect a lot of attributes like these and discuss whether we think we’ll need to directly, intentionally affect them, or whether we think we might accidentally affect them, or whether we don’t think they’ll be affected at all. I could easily be overlooking many ways in which they are interconnected.
4NunoSempere4moInteresting; this is potentially a problem.
2Denis Drescher4moIndeed. :-/ Or would you disagree with my impression that, for example, Squiggle or work on prediction markets is value neutral?
5NunoSempere4moMmmh, I sort of want to answer that, say, FTX isn't value neutral given that the founders are EAs and presumably want to donate their funds to effective causes? Or, like, OpenAI clearly isn't value neutral given that they're vetting which applications can use GPT-3. And it might be difficult to come up with an "OpenAI, but evil/value-neutral" organization. Whereas, say, the CIA's simple sabotage manual [https://www.cia.gov/news-information/featured-story-archive/2012-featured-story-archive/CleanedUOSSSimpleSabotage_sm.pdf] or Machiavelli's The Prince clearly are value neutral. The key difference seems to be that in one case the tools are embedded in an altruistic context, and in the second case, they aren't. So for example, maybe for creating collective decision making tools, one could generate them in such a way that they start out embedded, and they remain embedded in the EA community (but then the scale is reduced). That's pretty abstract, so some concrete examples might be: * Using lots of jargon * Using lots of references to previous EA materials * Establishing a lineage, so that powerful tools are only transmitted from master to student * Develop tools that require lots of implicit know-how, rather than steps that can be written down * Strong norms for privacy
2Denis Drescher4moOkay, I think I understand what you mean. What I meant by “X is value neutral” is something like “The platform FTX is value neutral even if the company FTX is not.” That probably not 100% true, but it’s a pretty good example, especially since I’m quite enamoured of FTX at the moment. OpenAI is all murky and fuzzy and opaque to me, so I don’t know what to think about that. I think your suggestions go in similar directions as some of mine in various answers, e.g., marketing the product mostly to altruistic actors. Intentional use of jargon is also something I’ve considered, but it comes at heavy costs, so it’s not my first choice. References to previous EA materials can work, but I find it hard to think of ways to apply that to Squiggle. But certainly some demo models can be EA-related to make it differentially easier and more exciting for EA-like people to learn how to use it. Lineage, implicit knowledge, and privacy: High costs again. Making a collaborative system secret would have it miss out on many of the benefits. And enforced openness may also help against bad stuff. But the lineage one is a fun idea I hadn’t thought of! :-D My conclusion [https://impartial-priorities.org/epistemic-hazards.html#conclusion] mostly hinges on whether runaway growth is unlikely or extremely unlikely. I’m assuming that it is extremely unlikely, so that we’ll always have time to react when things happen that we don’t want. So the first thing I’m thinking about now is how to notice when things happen that we don’t want – say, through monitoring the referrers of website views, Google alerts, bounties, or somehow creating value in the form of a community so that everyone who uses the software has a strong incentive to engage with that community. All in all, the measures I can think of are weak, but if the threat is also fairly unlikely, maybe those weak measures are proportional.
4Ozzie Gooen4moQuick chiming in; I'd agree that this work is relatively value neutral, except for two main points: 1) It seems like those with good values are often rather prone to use better tools, and we could push things more into the hands of good actors than bad ones. Effective Altruists have been quick to adapt many of the best practices (Bayesian reasoning, Superforecasting, probabilistic estimation), but most other groups haven't. 2) A lot of "values" seem instrumental to me. I think this kind of work could help change the instrumental values of many actors, if it were influential. My current impression is that there would be some level of value convergence that would come with intelligence, though it's not clear how much of this would happen. That said, it's of course possible that better decision-making could be used for bad cases. Hopefully our better decision making abilities as we go on this trajectory could help inform us as to how to best proceed :)
2Denis Drescher4mo1. Huh, yeah. I wonder whether this isn’t more of an “inadequate equilibria” type of thing where we use all the right tools that our goals incentivize us to use – an so do all the other groups, except their incentives are weird and different. Then there could easily be groups with uncooperative values but incentives that lead them to use the same tools. A counterargument could be that a lot of these tools require some expertise, and people who have that expertise are probably not usually desperate enough to have to take some evil job, so most of these people will choose a good/neutral job over and evil job even if the salary is a bit lower. But I suppose some socially skilled narcissist can just exploit any random modern surrogate religion to recruit good people for something evil by appealing to their morality in twisted ways. So I think it’s a pretty neat mechanism but also one that fails frequently. 2. Yeah, one of many, many benefits! :-) I don’t think the effect is going to be huge (so that we could rely on it) or tiny. But I’m also hoping that someone will use my system to help me clarify my values. ^^ Deferring to future versions of us: Yep!

Legibility

This is a less interesting failure mode as it is one where the systems that we create to improve our decision-making actually fail to achieve that goal. It’s not one where successfully achieving that goal backfires.

I also think that while this may be a limitation of some collaborative modeling efforts, it’s probably no problem for prediction markets.

The idea is that collaborative systems will always, at some stage, require communication, and specifically communication between brains rather than within brains. To make ideas communicable, they have to be made legible. (Or maybe literature, music, and art are counterexamples.) By legible, I’m referring to the concept from Seeing Like A State.

In my experience, this can be very limiting. Take for example what I’ll call the Cialdini puzzle:

Robert Cialdini’s Wikipedia page says “He is best known for his book Influence“. Since its publication, he seems to have spent his time directing an institute to spread awareness of techniques for success and persuasion. At the risk of being a little too cynical – a guy knows the secrets of success, so he uses them to… write a book about the secrets of success? If I knew the secrets of success, you could bet I’d be doing much more interesting things with them. All the best people recommend Cialdini, and his research credentials are impeccable, but I can’t help wondering: If he’s so smart, why isn’t he Emperor?

It seems to me like a common pattern that for certain activities the ability to do them well is uncorrelated or even anticorrelated with the ability to explain them. Some of that may be just because people want to keep their secrets, but I don’t think that explains much of it.

Hence Robert Cialdini may be > 99th percentile at understanding and explaining social influence, but in terms of doing social influence, that might’ve boosted him from the 40th to the 50th percentile or so. (He says his interest in the topic stems from his being particularly gullible.) Meanwhile, all the people he interviews because they have a knack for social influence are probably 40th to 50th percentile at explaining what they do. I don’t mean that they are average at explaining in general but that what they do is too complex, nuanced, unconscious, intertwined with self-deception, etc. for them to grasp it in a fashion that would allow for anything other than execution.

Likewise, a lot of amazing, famous writers have written books on how to write. And almost invariably these books are… unhelpful. If these writers followed the advice they set down in their own books, they’d be lousy writers. (This is based on a number of Language Log posts on such books.) Meanwhile, some of the most helpful books on writing that I’ve read were written by relatively unknown writers. (E.g., Style: Toward Clarity and Grace.)

My learning of Othello followed a similar trajectory. I got from a Kurnik rating of 1200 up to 1600 quite quickly by reading every explanatory book and text on game strategy that I could find and memorizing hundreds of openings. Beyond that, the skill necessary to progress further becomes too complex, nuanced, and unconscious that, it seems to me, it can only be attained through long practice, not taught. (Except, of course, if the teaching is all about practice.) And I didn’t like practice because it often meant playing against other people. (That is just my experience. If someone is an Othello savant, they may rather feel like some basic visualization practice unlocked the game for them, so that they’d still have increasing marginal utility from training around the area where it started dropping for me.)

Orthography is maybe the most legible illegible skill that I can think of. It can be taught in books, but few people read dictionaries in full. For me it sort of just happened rather suddenly that from one year to the next, I made vastly fewer orthographic mistakes (in German). It seems that my practice through reading must’ve reached some critical (soft) threshold where all the bigrams, trigrams, and exceptions of the language became sufficiently natural and intuitive that my error rate dropped noticeably.

For this to become a problem there’d have to be highly skilled practitioners, like the sort of people Cialdini likes to interview, who are brought together by a team or researchers to help them construct a model of some long-term future trajectory.

These skilled practitioners will do exactly the strategically optimal thing when put in a concrete situation, but in the abstract environment of such a probabilistic model, their predictions may be no better than anyone’s. It’ll take well-honed elicitation methods to get high-quality judgments out of these people, and then a lot of nuance may still be lost because what is elicited and how it fits into the model is probably again something that the researchers will determine, and that may be too low-fidelity.

Prediction markets, on the other hand, tend to be about concrete events in the near future, so skilled practitioners can probably visualize the circumstances that would lead to any outcome in sufficient detail to contribute a high-quality judgment.