Wei Dai

3847 karmaJoined Jun 2015


Sorted by New
· 4y ago · 1m read


While drafting this post, I wrote down and then deleted an example of "avoiding/deflecting questions about risk" because the person I asked such a question is probably already trying to push their organization to take risks more seriously, and probably had their own political considerations for not answering my question, so I don't want to single them out for criticism, and also don't want to damage my relationship with this person or make them want to engage less with me or people like me in the future.

Trying to enforce good risk management via social rewards/punishments might be pretty difficult for reasons like these.

My main altruistic endeavor involves thinking and writing about ideas that seem important and neglected. Here is a list of the specific risks that I'm trying to manage/mitigate in the course of doing this. What other risks am I overlooking or not paying enough attention to, and what additional mitigations I should be doing?

  1. Being wrong or overconfident, distracting people or harming the world with bad ideas.
    1. Think twice about my ideas/arguments. Look for counterarguments/risks/downsides. Try to maintain appropriate uncertainties and convey them in my writings.
  2. The idea isn't bad, but some people take it too seriously or too far.
    1. Convey my uncertainties. Monitor subsequent discussions and try to argue against people taking my ideas too seriously or too far.
  3. Causing differential intellectual progress in an undesirable direction, e.g., speeding up AI capabilities relative to AI safety, spreading ideas that are more useful for doing harm than doing good.
    1. Check ideas/topics for this risk. Self-censor ideas or switch research topics if the risk seems high.
  4. Being first to talk about some idea, but not developing/pursuing it as vigorously as someone else might if they were first, thereby causing a net delay in intellectual or social progress.
    1. Not sure what to do about this one. So far not doing anything except to think about it.
  5. PR/political risks, e.g., talking about something that damages my reputation or relationships, and in the worst case harms people/causes/ideas associated with me.
    1. Keep this in mind and talk more diplomatically or self-censor when appropriate.

@Will Aldred I forgot to mention that I do have the same concern about "safety by eating marginal probability" on AI philosophical competence as on AI alignment, namely that progress on solving problems lower in the difficulty scale might fool people into having a false sense of security. Concretely, today AIs are so philosophically incompetent that nobody trusts them to do philosophy (or almost nobody), but if they seemingly got better, but didn't really (or not enough relative to appearances), a lot more people might and it could be hard to convince them not to.

Thanks for the comment. I agree that what you describe is a hard part of the overall problem. I have a partial plan, which is to solve (probably using analytic methods) metaphilosophy for both analytic and non-analytic philosophy, and then use that knowledge to determine what to do next. I mean today the debate between the two philosophical traditions is pretty hopeless, since nobody even understands what people are really doing when they do analytic or non-analytic philosophy. Maybe the situation will improve automatically when metaphilosophy has been solved, or at least we'll have a better knowledge base for deciding what to do next.

If we can't solve metaphilosophy in time though (before AI takeoff), I'm not sure what the solution is. I guess AI developers use their taste in philosophy to determine how to filter the dataset, and everyone else hopes for the best?

  1. Just talking more about this problem would be a start. It would attract more attention and potentially resources to the topic, and make people who are trying to solve it feel more appreciated and less lonely. I'm just constantly confused why I'm the only person who frequently talks about it in public, given how obvious and serious the problem seems to me. It was more understandable before ChatGPT put AI on everyone's radar, but now it's just totally baffling. And I appreciate you writing this comment. My posts on the topic usually get voted up, but with few supporting comments, making me unsure who actually agrees with me that this is an important problem to work on.
  2. If you're a grant maker, you can decide to fund research in this area, and make some public statements to that effect.
  3. If might be useful to think in terms of a "AI philosophical competence difficulty scale" similar to Sammy Martin's AI alignment difficulty scale and "safety by eating marginal probability". I tend to focus on the higher end of that scale, where we need to achieve a good explicit understanding of metaphilosophy, because I think solving that problem is the only way to reduce risk to a minimum, and it also fits my inclination for philosophical problems, but someone more oriented towards ML research could look for problems elsewhere on the difficulty scale, for example fine-tuning a LLM to do better philosophical reasoning, to see how far that can go. Another idea is to fine-tune a LLM for pure persuasion, and see if that can be used to create an AI that deemphasizes persuasion techniques that don't constitute valid reasoning (by subtracting the differences in model weights somehow).
  4. Some professional philosopher(s) may actually be starting a new org to do research in this area, so watch out for that news and check how you can contribute. Again providing funding will probably be an option.
  5. Think about social aspects of the problem. What would it take for most people or politicians to take the AI philosophical competence problem seriously? Or AI lab leaders? What can be done if they never do?
  6. Think about how to evaluate (purported) progress in the area. Are there clever ways to make benchmarks that can motivate people to work on the problem (and not be easily Goodharted against)?
  7. Just to reemphasize, talk more about the problem, or prod your favorite philosopher or AI safety person to talk more about it. Again it's totally baffling the degree to which nobody talks about this. I don't think I've even once heard a professional philosopher publicly express a concern that AI might be relatively incompetent in philosophy, even as some opine freely on other aspects of AI. There are certainly obstacles for people to work on the problem like your reasons 1-3, but for now the bottleneck could just as well be in the lack of social proof that the problem is worth working on.
Answer by Wei DaiJan 15, 20242

How should we deal with the possibility/risk of AIs inherently disfavoring all the D's that Vitalik wants to accelerate? See my Twitter thread replying to his essay for more details.

Answer by Wei DaiJan 11, 202419

The universe can probably support a lot more sentient life if we convert everything that we can into computronium (optimized computing substrate) and use it to run digital/artificial/simulated lives, instead of just colonizing the universe with biological humans. To conclude that such a future doesn't have much more potential value than your 2010 world, we would have to assign zero value to such non-biological lives, or value each of them much less than a biological human, or make other very questionable assumptions. The Newberry 2021 paper that Vasco Grilo linked to has a section about about this:

If a significant fraction of humanity’s morally-relevant successors were instantiated digitally, rather than biologically, this would have truly staggering implications for the expected size of the future. As noted earlier, Bostrom (2014) estimates that 10^35 human lives could be created over the entire future, given known physical limits, and that 10^58 human lives could be created if we allow for the possibility of digital persons. While these figures were not intended to indicate a simple scaling law 31, they do imply that digital persons can in principle be far, far more resource efficient than biological life. Bostrom’s estimate of the number of digital lives is also conservative, in that it assumes all such lives will be emulations of human minds; it is by no means clear that whole-brain emulation represents the upper limit of what could be achieved. For a simple example, one can readily imagine digital persons that are similar to whole-brain emulations, but engineered so as to minimise waste energy, thereby increasing resource efficiency.

I think, as a matter of verifiable fact, that if people solve the technical problems of AI alignment, they will use AIs to maximize their own economic consumption, rather than pursue broad utilitarian goals like “maximize the amount of pleasure in the universe”.

If you extrapolate this out to after technological maturity, say 1 million years from now, what does selfish "economic consumption" look like? I tend to think that people's selfish desires will be fairly easily satiated once everyone is much much richer and the more "scalable" "moral" values would dominate resource consumption at that point, but it might just be my imagination failing me.

I think mundane economic forces are simply much more impactful.

Why does "mundane economic forces" cause resources to be consumed towards selfish ends? I think economic forces select for agents who want to and are good at accumulating resources, but will probably leave quite a bit of freedom in how those resources are ultimately used once the current cosmic/technological gold rush is over. It's also possible that our future civilization uses up much of the cosmic endowment through wasteful competition, leaving little or nothing to consume in the end. Is that's your main concern?

(By "wasteful competition" I mean things like military conflict, costly signaling, races of various kinds that accumulate a lot of unnecessary risks/costs. It seems possible that you categorize these under "selfishness" whereas I see them more as "strategic errors".)

To be sure, ensuring AI development proceeds ethically is a valuable aim, but I claim this goal is *not *the same thing as “AI alignment”, in the sense of getting AIs to try to do what people want.

There was at least one early definition of "AI alignment" to mean something much broader:

The "alignment problem for advanced agents" or "AI alignment" is the overarching research topic of how to develop sufficiently advanced machine intelligences such that running them produces good outcomes in the real world.

I've argued that we should keep using this broader definition, in part for historical reasons, and in part so that AI labs (and others, such as EAs) can more easily keep in mind that their ethical obligations/opportunities go beyond making sure that AI does what people want. But it seems that I've lost that argument so it's good to periodically remind people to think more broadly about their obligations/opportunities. (You don't say this explicitly, but I'm guessing it's part of your aim in writing this post?)

(Recently I've been using "AI safety" and "AI x-safety" interchangeably when I want to refer to the "overarching" project of making the AI transition go well, but I'm open to being convinced that we should come up with another term for this.)

That said, I think I'm less worried than you about "selfishness" in particular and more worried about moral/philosophical/strategic errors in general. The way most people form their morality is scary to me, and personally I would push humanity to be more philosophically competent/inclined before pushing it to be less selfish.

Load more