bmg

Ben Garfinkel - Researcher at Future of Humanity Institute

bmg's Comments

What posts do you want someone to write?

I think that chapter in the Precipice is really good, but it's not exactly the sort of thing I have in mind.

Although Toby's less optimistic than I am, he's still only arguing for a 10% probability of existentially bad outcomes from misalignment.* The argument in the chapter is also, by necessity, relatively cursory. It's aiming to introduce the field of artificial intelligence and the concept of AGI to readers who might be unfamiliar with it, explain what misalignment risk is, make the idea vivid to readers, clarify misconceptions, describe the state of expert opinion, and add in various other nuances all within the span of about fifteen pages. I think that it succeeds very well in what it's aiming to do, but I would say that it's aiming for something fairly different.

*Technically, if I remember correctly, it's a 10% probability within the next century. So the implied overall probability is at least somewhat higher.

What posts do you want someone to write?
Answer by bmgMar 26, 202025

I'd be really interested in reading an updated post that makes the case for there being an especially high (e.g. >10%) probability that AI alignment problems will lead to existentially bad outcomes.

There still isn't a lot of writing explaining case for existential misalignment risk. And a significant fraction of what's been produced since Superintelligence is either: (a) roughly summarizing arguments in Superintelligence, (b) pretty cursory, or (c) written by people who are relative optimists and are in large part trying to explain their relative optimism.

Since I have the (possibly mistaken) impression that a decent number of people in the EA community are quite pessimistic regarding existential misalignment risk, on the basis of reasoning that goes significantly beyond what's in Superintelligence, I'd really like to understand this position a lot better and be in a position to evaluate the arguments for it.

(My ideal version of this post would probably assume some degree of familiarity with contemporary machine learning, and contemporary safety/robustness issues, but no previous familiarity with arguments that AI poses an existential risk.)

Request for Feedback: Draft of a COI policy for the Long Term Future Fund

More broadly, I just feel really uncomfortable with having to write all of our documents to make sense on a purely associative level. I as a donor would be really excited to see a COI policy as concrete as the one above, similarly to how all the concrete mistake pages on all the EA org websites make me really excited. I feel like making the policy less concrete trades of getting something right and as such being quite exciting to people like me, in favor of being more broadly palatable to some large group of people, and maybe making a bit fewer enemies. But that feels like it's usually going to be the wrong strategy for a fund like ours, where I am most excited about having a small group of really dedicated donors who are really excited about what we are doing, much more than being very broadly palatable to a large audience, without anyone being particularly excited about it.

It seems to me like there's probably an asymmetry here. I would be pretty surprised if the inclusion of specific references to drug use and metamours was the final factor that tipped anyone into a decision to donate to the fund. I wouldn't be too surprised, though, if the inclusion tipped at least some small handful of potential donors into bouncing. At least, if I were encountering the fund for the first time, I can imagine these inclusions being one minor input into any feeling of wariness I might have.

(The obvious qualifier here, though, is that you presumably know the current and target demographics of the fund better than I do. I expect different groups of people will tend to react very differently.)

I feel like the thing that is happening here makes me pretty uncomfortable, and I really don't want to further incentivize this kind of assessment of stuff.

Apologies if I'm misreading, but it feels to me like the suggestion here might be that intentionally using a more "high-level" COI is akin to trying to 'mislead' potential donors by withholding information. If that's the suggestion, then I think I at least mostly disagree. I think that having a COI that describes conflicts in less concrete terms is mostly about demonstrating an expected form of professionalism.

As an obviously extreme analogy, suppose that someone applying for a job decides to include information about their sexual history on their CV. There's some sense in which this person is being more "honest" than someone who doesn't include that information. But any employer who receives this CV will presumably have a negative reaction. This reaction also won't be irrational, since it suggests the applicant is either unaware of norms around this sort of thing or (admittedly a bit circularly) making a bad decision to willfully transgress them. In either case, it's reasonable for the employer to be a lot more wary of the applicant than they otherwise be.

I think the dynamic is roughly the same as the dynamic that leads people to (rationally) prefer to hire lawyers who wear suits over those who don't, to trust think tanks that format and copy-edit their papers properly over those who don't, and so on.

This case is admittedly more complicated than the case of lawyers and suits, since you are in fact depriving potential donors of some amount of information. (At worst, suits just hide information about lawyers' preferred style of dress.) So there's an actual trade-off to be balanced. But I'm inclined to agree with Howie that the extra clarity you get from moving beyond 'high-level' categories probably isn't all that decision-relevant.

I'm not totally sure, though. In part, it's sort of an empirical question whether a merely high-level COI would give any donors an (in their view) importantly inaccurate or incomplete impression of how COIs are managed. If enough potential donors do seem to feel this way, then it's presumably worth being more detailed.

Concerning the Recent 2019-Novel Coronavirus Outbreak

No, I think that would be far worse.

But if two people were (for example) betting on a prediction platform that's been set up by public health officials to inform prioritization decisions, then this would make the bet better. The reason is that, in this context, it would obviously matter if their expressed credences are well-callibrated and honestly meant. To the extent that the act of making the bet helps temporarily put some observers "on their toes" when publicly expressing credences, the most likely people to be put "on their toes" (other users of the platform) are also people whose expressed credences have an impact. So there would be an especially solid pro-social case for making the bet.

I suppose this bullet point is mostly just trying to get at the idea that a bet is better if it can clearly be helpful. (I should have said "positively influence" instead of just "influence.") If a bet creates actionable incentives to kill people, on the other hand, that's not a good thing.

Concerning the Recent 2019-Novel Coronavirus Outbreak

Maybe you are describing a distinction that is more complicated than I am currently comprehending, but I at least would expect Chi and Greg to object to bets of the type "what is the expected number of people dying in self-driving car accidents over the next decade?", "Will there be an accident involving an AGI project that would classify as a 'near-miss', killing at least 10000 people or causing at least 10 billion dollars in economic damages within the next 50 years?" and "what is the likelihood of this new bednet distribution method outperforming existing methods by more than 30%, saving 30000 additional people over the next year?".

Just as an additional note, to speak directly to the examples you gave: I would personally feel very little discomfort if two people (esp. people actively making or influencing decisions about donations and funding) wanted to publicly bet on the question: "What is the likelihood of this new bednet distribution method outperforming existing methods by more than 30%, saving 30000 additional people over the next year?" I obviously don't know, but I would guess that Chi and Greg would both feel more comfortable about that question as well. I think that some random "passerby" might still feel some amount of discomfort, but probably substantially less.

I realize that there probably aren't very principled reasons to view one bet here as intrinsically more objectionable than others. I listed some factors that seem to contribute to my judgments in my other comment, but they're obviously a bit of a hodgepodge. My fully reflective moral view is also that there probably isn't anything intrinsically wrong with any category of bets. For better or worse, though, I think that certain bets will predictably be discomforting and wrong-feeling to many people (including me). Then I think this discomfort is worth weighing against the plausible social benefits of the individual bet being made. At least on rare occasions, the trade-off probably won't be worth it.

I ultimately don't think my view here is that different than common views on lots of other more mundane social norms. For example: I don't think there's anything intrinsically morally wrong about speaking ill of the dead. I recognize that a blanket prohibition on speaking ill of the dead would be a totally ridiculous and socially/epistemically harmful form of censorship. But it's still true that, in some hard-to-summarize class of cases, criticizing someone who's died is going to strike a lot of people as especially uncomfortable and wrong. Even without any specific speech "ban" in place, I think that it's worth giving weight to these feelings when you decide what to say.

What this general line of thought implies about particular bets is obviously pretty unclear. Maybe the value of publicly betting is consistently high enough to, in pretty much all cases, render feelings of discomfort irrelevant. Or maybe, if the community tries to have any norms around public betting, then the expected cost of wise bets avoided due to "false positives" would just be much higher than the expected the cost of unwise bets made due to "false negatives." I don't believe this, but I obviously don't know. My best guess is that it probably makes sense to strike a (messy/unprincipled/disputed) balance that's not too dissimilar from balances we strike in other social and professional contexts.

(As an off-hand note, for whatever it's worth, I've also updated in the direction of thinking that the particular bet that triggered this thread was worthwhile. I also, of course, feel a bit weird having somehow now written so much about the fine nuances of betting norms in a thread about a deadly virus.)

Concerning the Recent 2019-Novel Coronavirus Outbreak

Thanks! I do want to stress that I really respect your motives in this case and your evident thoughtfulness and empathy in response to the discussion; I also think this particular bet might be overall beneficial. I also agree with your suggestion that explicitly stating intent and being especially careful with tone/framing can probably do a lot of work.

It's maybe a bit unfortunate that I'm making this comment in a thread that began with your bet, then, since my comment isn't really about your bet. I realize it's probably pretty unpleasant to have an extended ethics debate somehow spring up around one of your posts.

I mainly just wanted to say that it's OK for people to raise feelings of personal/moral discomfort and that these feelings of discomfort can at least sometimes be important enough to justify refraining from a public bet. It seemed to me like some of the reaction to Chi's comment went too far in the opposite direction. Maybe wrongly/unfairly, it seemed to me that there was some suggestion that this sort of discomfort should basically just be ignored or that people should feel discouraged from expressing their discomfort on the EA Forum.

Concerning the Recent 2019-Novel Coronavirus Outbreak

To clarify a bit, I'm not in general against people betting on morally serious issues. I think it's possible that this particular bet is also well-justified, since there's a chance some people reading the post and thread might actually be trying to make decisions about how to devote time/resources to the issue. Making the bet might also cause other people to feel more "on their toes" in the future, when making potentially ungrounded public predictions, if they now feel like there's a greater chance someone might challenge them. So there are potential upsides, which could outweigh the downsides raised.

At the same time, though, I do find certain kinds of bets discomforting and expect a pretty large portion of people (esp. people without much EA exposure) to feel discomforted too. I think that the cases where I'm most likely to feel uncomfortable would be ones where:

  • The bet is about an ongoing, pretty concrete tragedy with non-hypothetical victims. One person "profits" if the victims become more numerous and suffer more.

  • The people making the bet aren't, even pretty indirectly, in a position to influence the management of the tragedy or the dedication of resources to it. It doesn't actually matter all that much, in other words, if one of them is over- or under-confident about some aspect of the tragedy.

  • The bet is made in an otherwise "casual"/"social" setting.

  • (Importantly) It feels like the people are pretty much just betting to have fun, embarrass the other person, or make money.

I realize these aren't very principled criteria. It'd be a bit weird if the true theory of morality made a principled distinction between bets about "hypothetical" and "non-hypothetical" victims. Nevertheless, I do still have a pretty strong sense of moral queeziness about bets of this sort. To use an implausibly extreme case again, I'd feel like something was really going wrong if people were fruitlessly betting about stuff like "Will troubled person X kill themselves this year?"

I also think that the vast majority of public bets that people have made online are totally fine. So maybe my comments here don't actually matter very much. I mainly just want to make the point that: (a) Feelings of common-sense moral discomfort shouldn't be totally ignored or dismissed and (b) it's at least sometimes the right call to refrain from public betting in light of these feelings.

At a more general level, I really do think it's important for the community in terms of health, reputation, inclusiveness, etc., if common-sense feelings of moral and personal comfort are taken seriously. I'm definitely happy that the community has a norm of it typically being OK to publicly challenge others to bets. But I also want to make sure we have a strong norm against discouraging people from raising their own feelings of discomfort.

(I apologize if it turns out I'm disagreeing with an implicit straw-man here.)

Concerning the Recent 2019-Novel Coronavirus Outbreak

I can guess that the primary motivation is not "making money" or "the feeling of winning and being right" - which would be quite inappropriate in this context

I don't think these motivations would be inappropriate in this context. Those are fine motivations that we healthily leverage in large parts of the world to cause people to do good things, so of course we should leverage them here to allow us to do good things.

The whole economy relies on people being motivated to make money, and it has been a key ingredient to our ability to sustain the most prosperous period humanity has ever experienced (cf. more broadly the stock market). Of course I want people to have accurate beliefs by giving them the opportunity to make money. That is how you get them to have accurate beliefs!

At least from a common-sense morality perspective, this doesn't sit right with me. I do feel that it would be wrong for two people to get together to bet about some horrible tragedy -- "How many people will die in this genocide?" "Will troubled person X kill themselves this year?" etc. -- purely because they thought it'd be fun to win a bet and make some money off a friend. I definitely wouldn't feel comfortable if a lot of people around me were doing this.

When the motives involve working to form more accurate and rigorous beliefs about ethically pressing issues, as they clearly were in this case, I think that's a different story. I'm sympathetic to the thought that it would be bad to discourage this sort of public bet. I think it might also be possible to argue that, if the benefits of betting are great enough, then it's worth condoning or even encouraging more ghoulishly motivated bets too. I guess I don't really buy that, though. I don't think that a norm specifically against public bets that are ghoulish from a common-sense morality perspective would place very important limitations on the community's ability to form accurate beliefs or do good.

I do also think there are significant downsides, on the other hand, to having a culture that disregards common-sense feelings of discomfort like the ones Chi's comment expressed.

[[EDIT: As a clarification, I'm not classifying the particular bet in this thread as "ghoulish." I share the general sort of discomfort that Chi's comment describes, while also recognizing that the bet was well-motivated and potentially helpful. I'm more generally pushing back against the thought that evident motives don't matter much or that concerns about discomfort/disrespectfulness should never lead people to refrain from public bets.]]

I'm Buck Shlegeris, I do research and outreach at MIRI, AMA

I think I disagree with the claim (or implication) that keeping P is more often more natural. Well, you're just saying it's "often" natural, and I suppose it's natural in some cases and not others. But I think we may disagree on how often it's natural, though hard to say at this very abstract level. (Did you see my comment in response to your Realism and Rationality post?)

In particular, I'm curious what makes you optimistic about finding a "correct" criterion of rightness. In the case of the politician, it seems clear that learning they don't have some of the properties you thought shouldn't call into question whether they exist at all.

But for the case of a criterion of rightness, my intuition (informed by the style of thinking in my comment), is that there's no particular reason to think there should be one criterion that obviously fits the bill. Your intuition seems to be the opposite, and I'm not sure I understand why.

Hey again!

I appreciated your comment on the LW post. I started writing up a response to this comment and your LW one, back when the thread was still active, and then stopped because it had become obscenely long. Then I ended up badly needing to procrastinate doing something else today. So here’s an over-long document I probably shouldn’t have written, which you are under no social obligation to read.

I'm Buck Shlegeris, I do research and outreach at MIRI, AMA

Just to say slightly more on this, I think the Bomb case is again useful for illustrating my (I think not uncommon) intuitions here.

Bomb Case: Omega puts a million dollars in a transparent box if he predicts you'll open it. He puts a bomb in the transparent box if he predicts you won't open it. He's only wrong about one in a trillion times.

Now suppose you enter the room and see that there's a bomb in the box. You know that if you open the box, the bomb will explode and you will die a horrible and painful death. If you leave the room and don't open the box, then nothing bad will happen to you. You'll return to a grateful family and live a full and healthy life. You understand all this. You want so badly to live. You then decide to walk up to the bomb and blow yourself up.

Intuitively, this decision strikes me as deeply irrational. You're intentionally taking an action that you know will cause a horrible outcome that you want badly to avoid. It feels very relevant that you're flagrantly violating the "Don't Make Things Worse" principle.

Now, let's step back a time step. Suppose you know that you're sort of person who would refuse to kill yourself by detonating the bomb. You might decide that -- since Omega is such an accurate predictor -- it's worth taking a pill to turn you into that sort of person, to increase your odds of getting a million dollars. You recognize that this may lead you, in the future, to take an action that makes things worse in a horrifying way. But you calculate that the decision you're making now is nonetheless making things better in expectation.

This decision strikes me as pretty intuitively rational. You're violating the second principle -- the "Don't Commit to a Policy..." Principle -- but this violation just doesn't seem that intuitively relevent or remarkable to me. I personally feel like there is nothing too odd about the idea that it can be rational to commit to violating principles of rationality in the future.

(This obviously just a description of my own intuitions, as they stand, though.)

Load More