7

Thanks for the write-up! I appreciate the orientation towards simplicity and directness of value quantification (including uncertainty) and the overall mindset and spirit of the processes that this induces in people's minds.

I am wondering why I am not actually surprised that I don't see people (esp. EAs, and including myself) doing this more.

I would expect some of the (assumed) reasons to be simple to state and (in principle) overcome (e.g. lack of skills, not as valued in culture, inadequate epistemic habits, ...) and some to be complex and hard (bias towards quantifiable parts of the model, estimation issues with long-tailed distributions, various elephants in the brains or social/cultural dynamics, robustness, ..).

In particular, I would expect some hard obstacles or downsides to actually exist as it seems to me that some people whose epistemics, agency and models of the world I highly respect use quantitative estimates way less they could, at least in some domains. (In particular I assume they don't lack skill or epistemics in general.)

What is your take on why people may *not *want to use this (or a similar) quantitative framework for speculative interventions (or some types of projects)?

An ingenious friend just pointed out a likely much larger point of influence of quantum particle-level noise on humanity: the randomness in DNA recombination during meiosis (gamete formation) is effectively driven by single molecular machine and the individual crossovers etc likely strongly depends on the Brownian-level noise. This would mean that some substantial part of people would have slightly different genetic makeup, from which I would expect substantial timeline divergence over 100s of years at most (measuring differences on the level of human society).

For example, if you went back in time 1000 years and painted someone's house a different color, my probability distribution for the weather here and now would look like the historical average for weather here, rather than the weather in the original timeline.

I think the crux here may not be the butterfly effect, but the overall accumulated effect of (quantum) randomness: I would expect that if you went 1000 years back and just "re-ran" the world from the same quantum state (no house painting etc.), the would would be different (at least on the human-preceivable level; not sure about weather) just because so many events are somewhat infuenced by (micro-state) quantum randomness.

The question is only the extent of the quantum effects on the macro-state (even without explicit quantum coin flips) but I expect this to be quite large e.g. in human biology and in particular brain - see my other comment (Re 2.) (NB: independent from any claims about brain function relying on quantum effects etc.).

(Note that I would expect the difference after 1000 years to be substantially larger if you consider the entire world-state to be a quantum state with some superpositions, where e.g. the micro-states of air molecules are mostly unobserved and in various superpositions etc., therefore increasing the randomness effect substantially ... but that is merely an additional intuition.)

**Re 1.:** Yeah, if you consider "determined but unknown" in place of the "non-quantum randomness", this is indeed different. Let me sketch a (toy) example scenario for that: *We have fixed two million-bit numbers A and B (not assuming quantum random, just fixed arbitrary; e.g. some prefix of pi and e would do). Let P2(x) mean "x is a product of exactly 2 primes". We commit to flipping a quantum coin and on heads, we we destroy humanity iff P2(A), on tails, we destroy humanity iff P2(B). At the time of coin-flip, we don't know P2(A) or P2(B) (and assume they are not obvious) and can only find out (much) later.*

Is that a good model of what you would "deterministic but uncertain/unknown"?

In this model (and a world with no other branchings - which I want to note is hard to conceive as realistic), I would lean towards agreeing that "doing the above" has higher chance of some branch surviving than e.g. just doing "*we destroy humanity iff P2(A)*".

(FWIW, I would assume the real world to be highly underdetermined in outcome just through quantum randomness on sufficient time-scales, even including some intuitions about attractor trajectories; see just below.)**Re 2.:** The rate of accumulation of (ubiquitous, micro-state level) quantum effects on the macro-state of the world are indeed uncertain. While I would expect the cumulative effects on the weather to be small (over a year, say?), the effects on humanity may already be large: all our biochemistry (e.g. protein interactions, protein expression, ...) is propelled by Brownian motion of the molecules, where the quantum effects are very strong, and on the cellular scale, I would expect the quantum effects to contribute a substantial amount of noise e.g. in the delays and sometimes outcomes. Since this also applies to the neurons in the brain (delays, sometimes spike flipped to non-spike), I would assume the noise would have effects observable by other humans (who observed the counterfactual timeline) within weeks (very weak sense of the proper range, but I would expect it to be e.g. less than years).

This would already mean that some crucial human decisions and interactions would change over the course of years (again, scale uncertain but I would expect less rather than more).

(Note that the above "quantum randomness effect" on the brain has nothing to do with questions whether any function of the brain depends on quantum effects - it merely talks about nature of certain noise in the process.)

I think this somehow assumes two types of randomness: quantum randomness and some other (non-quantum) randomness, which I think is an inconsistent assumption.

In particular, in a world with quantum-physics, all events either depend on "quantum" random effects or are (sufficiently) determined*. If e.g. extinction is uncertain, then there will be further splits, some branches avoiding extinction. In the other (although only hypothetical) case of it being determined in some branch, the probability is either 0 or 1, voiding the argument.

*) In a world with quantum physics, nothing is perfectly determined due to quantum fluctuations, but here I mean something practically determined (and also allowing for hypothetical world models with some deterministic parts/aspects).

I think it is confusing to think about quantum randomness as being only generated in special circumstances - to the contrary, it is extremely common. (At the same time, it is not common to be certain that a concrete "random" bit you somehow obtain is quantum random and fully independent of others, or possibly correlated!)

To illustrate what I mean by all probability being quantum randomness: While some coin flips are thrown in a way that almost surely determines the result (even air molecule fluctuations are very unlikely to change the result - both in quantum and classical sense) - some coin flips are thrown in such a way that the outcome is uncertain to the extent to be mostly dependent on the air particles bumping into the coin (imagine a really long throw); lets look at those.

The movements of individual particles are to large extent quantum random (on the particle level, also - the exact micro-state of individual air molecules is to a large extent quantum-random), and moreover influenced by quantum fluctuations. As the coin flies through the air, the quantum randomness influence "accumulates" on the coin, and the flip can be almost-quantum-random without any single clearly identifiable "split" event. There are split events all along the coin movement, with various probabilities and small influences on the outcome - each contributing very small amounts of entropy to the target distribution. (In a simplified "coin" model, each of the quantum-random subatimic-level events would directly contribute a small amount of entropy (in bits) to the outcome.)

With this, it is not clear what you mean "one branch survives", as in fully quantum world, there are always some branches always survive; the probability may just be low. My point is that there (fundamentally) is no consistent way to say what constitutes the branches we should care about.

If you instead care about the total probability measure of the branches with an extinction event, that should be the same as old-fashioned probability of extinction - my (limited) understanding of quantum physics interpretations is that by just changing the interpretation, you won't get different expected results in a randomly chosen timeline. (I would appreciate being corrected here if my general understanding is lacking or missing some assumption.)

My intuition is also that the discount for academia solving core alignment problems should be (much?) higher than here. At the same time I agree that some mainstream work (esp. foundations) does help current AI alignment research significantly. I would expect (and hope) more of this to still appear, but to be increasingly sparse (relative to amount of work in AI).

I think that it would be useful to have a contribution model that can distinguish (at least) between *a)* *improving the wider area* (including e.g. fundamental models, general tools, best practices, learnability) and *b)* *working on the problem itself*. Distinguishing past contribution and expected future contribution (resp. discount factor) may also help.

*Why:* Having a well developed field is a big help in solving any particular problem X adjacent to it and it seems reasonable to assign a part of the value of "X is solved" to work done on the field. However, field development alone is unlikely to solve X for sufficiently hard X that is not in the field's foci, and dedicated work on X is still needed. I imagine this applies to the field of ML/AI and long-termist AI alignment.

*Model sketch:* General work done on the field has diminishing returns towards the work remaining on the problem. As the field grows, it branches and the surface area grows as a function of this, and the progress in directions that are not foci slows appropriately. Extensive investment in the field would solve any problem eventually but unfocused effort would is increasingly inefficient. *Main uncertainties:* I am not sure how to model areas of field focus and the faster progress in their vicinity, or how much I would expect some direction sufficiently close to AI alignment to be a focus of AI.

Overall, this would make me to expect that the past work in AI and ML would have a significant contribution towards AI alignment but to expect increasing discount in the future, unless alignment becomes a focus for AI or close to it. When thinking about policy implications for focusing research effort (with the goal of solving AI alignment), I would expect the returns to general academia to diminish much faster than to EA safety research.

Or did you mean ...?