Wei Dai

3429 karmaJoined Jun 2015


Sorted by New
· 4y ago · 1m read


What is a plausible source of x-risk that is 10% per century for the rest of time? It seems pretty likely to me that not long after reaching technological maturity, future civilization would reduce x-risk per century to a much lower level, because you could build a surveillance/defense system against all known x-risks, and not have to worry about new technology coming along and surprising you.

It seems that to get a constant 10% per century risk, you'd need some kind of existential threat for which there is no defense (maybe vacuum collapse), or for which the defense is so costly that that the public goods problem prevents it from being built (e.g., no single star system can afford it on their own). But the likelihood of such a threat existing in our universe doesn't seem that high to me (maybe 20%?) which I think upper bounds the long term x-risk.

Curious how your model differs from this.

I’m confused about how it’s possible to know whether someone is making substantive progress on metaphilosophy; I’d be curious if you have pointers.

I guess it's the same as any other philosophical topic, either use your own philosophical reasoning/judgement to decide how good the person's ideas/arguments are, and/or defer to other people's judgements. The fact that there is currently no methodology for doing this that is less subjective and informal is a major reason for me to be interested in metaphilosophy, since if we solve metaphilosophy that will hopefully give us a better methodology for judging all philosophical ideas, assuming the correct solution to metaphilosophy isn't philosophical anti-realism (i.e., philosophical questions don't have right or wrong answers), or something like that.

Any thoughts on Meta Questions about Metaphilosophy from a grant maker perspective? For example have you seen any promising grant proposals related to metaphilosophy or ensuring philosophical competence of AI / future civilization, that you rejected due to funding constraints or other reasons?

I think we need to get good at predicting what language models will be able to do — in the real world, not just on benchmarks.

Any ideas how to do this? It seems like one key difficulty is that we just don't have good explicit understandings of many cognitive abilities, and don't have much hope of achieving such understandings in the relevant time frame.

So I'm not sure what can be done aside from applying human intuition to whatever relevant info we have (like how LMs qualitatively progressed in the past, how hard various future capabilities seem). Maybe try to find people with particularly good intuitions and calibrations (as demonstrated by past records of predictions)? More/better usage of prediction markets?

Anyway, does anyone have any qualitative predictions of what AIs produced by $1B training runs will look like? What do you think they will be able to do that will be most interesting or useful or or dangerous or economically valuable?

One way to affect things is to increase the probability that humanity ends up building a healthy and philosophically competent civilization. (But we already knew that was important.)

Do you know anyone who is actually working on this, especially the second part (philosophical competence)? I've been thinking about this myself, and wrote some LW posts on the topic. (In short, my main message is that if we care about our collective philosophical competence, the AI transition represents both a high risk and a unique opportunity.) But I feel like my public and private efforts to attract more attention and work to this area haven't yielded much. Do you see things differently?

I'd love to see research into what I called "human safety problems" (or sometimes "human-AI safety"), fleshing out the idea more or giving some empirical evidence as to how much of a problem it really is. Here's a short description of the idea from AI design as opportunity and obligation to address human safety problems:

Many AI safety problems are likely to have counterparts in humans. AI designers and safety researchers shouldn’t start by assuming that humans are safe (and then try to inductively prove that increasingly powerful AI systems are safe when developed/trained by and added to a team of humans) or try to solve AI safety problems without considering whether their designs or safety approaches exacerbate human safety problems relative to other designs / safety approaches. At the same time, the development of AI may be a huge opportunity to address human safety problems, for example by transferring power from probably unsafe humans to de novo AIs that are designed from the ground up to be safe, or by assisting humans’ built-in safety mechanisms (such as moral and philosophical reflection).

I go into a bit more detail in Two Neglected Problems in Human-AI Safety.

Would you be ok with a plan to colonize only planets that are very unlikely to evolve life, for example because the planet is too close or too far from its sun?

I have two thoughts that spring from this:

  1. If epistemic conditions in the US weren't always like this, could it be that on average liberal democracies still tend to have better epistemic conditions than authoritarian regimes? (And we just happen to live in an unlucky period where things are especially bad?)
  2. Maybe it's comparably important to model/understand "internal regime changes" (a term I just made up) where conditions for making scientific or intellectual progress (or other conditions that we care about) improve or deteriorate a lot due to institutional changes that don't fit the standard definition of "regime change"?

Related to your last paragraph, what do you think about Have epistemic conditions always been this bad? In other words, was there a time when the US wasn't like this?

In Six Plausible Meta-Ethical Alternatives, I wrote (as one of the six alternatives):

  1. Most intelligent beings in the multiverse share similar preferences. This came about because there are facts about what preferences one should have, just like there exist facts about what decision theory one should use or what prior one should have, and species that manage to build intergalactic civilizations (or the equivalent in other universes) tend to discover all of these facts. There are occasional paperclip maximizers that arise, but they are a relatively minor presence or tend to be taken over by more sophisticated minds.

I think in this post you're not giving enough attention to the possibility that there's something that we call "doing philosophy" that can be used to discover all kinds of philosophical truths, and that you can't become a truly powerful civilization without being able to "do philosophy" and be generally motivated by the results. Consider that philosophy seems to have helped the West become the dominant civilization on Earth, for example by inventing logic and science, and more recently have led to the discovery of ideas like acausal extortion/trade (which seem promising albeit still highly speculative). Of course I'm very uncertain of this and have little idea what "doing philosophy" actually consists of, but I've written a few more words on this topic if you're interested.

Load more