Appreciate this comment, and very much agree. I generally think that humanity's descendents are going to saturate the stars with Dyson swarms making stuff (there's good incentives to achieve explosive growth) but I think we're (1) too quick to assume that, (2) too quick to assume we will stop being attached to inefficient earth stuff, and (3) too quick to assume the Dyson swarms will be implementing great stuff rather than, say, insentient digital slaves used to amass power or solve scientific problems.
Let's say there are three threat models here: (a) Weird Stuff Matters A Lot, (b) Attachment to Biological Organisms, (c) Disneyland With No Children (the machines aren't conscious).
I focused mainly on Weird Stuff Matters A Lot. The main reason I focused on this rather than Attachment to Biological Organisms is that I still think that computers are going to be so much more economically efficient than biology that in expectation ~75% of everything is computer. Computers are just much more useful than animals for most purposes, and it would be super crazy from most perspectives not to turn most of the stars into computers. (I wouldn't totally rule out us failing to do that, but incentives push towards it strongly.) If, in expectation, ~75% of everything is computer, then maximizing computer only makes the world better by 1/3.
I think the Disneyland With No Children threat model is much scarier. I focused on it less here because I wanted to shore up broadly appealing theoretical reasons for trajectory change, and this argument feels much more partisan. But on my partisan worldview:
If this "irrealist" view is right, it's extremely easy to lose out on almost all value.
Separately, I just don't think our descendents are going to care very much about whether the computers are actually conscious, and so AI design choices are going to be orthogonal to moral value. On this different sort of orthogonality thesis, we'll lose out on most value just because our descendents will use AI for practical reasons other than moral reasons, and so their intrinsic value will be unoptimized.
So Disneyland With No Children type threat models look very credible to me.
(I do think humans will make a lot of copies of themselves, which is decently valuable, but not if you're comparing it to the most valuable world or if you value diversity.)
You could have a more realist view where we just make a big breakthrough in cognitive science and realize that a very glowy, distinctive set of computational properties was what we were talking about all along when we talked about consciousness, and everyone would agree to that. I don't really think that's how science works, but even if you did have that view it's hard to see how the computational properties would just wear their cardinality on their sleeves. Whatever computational properties you find you can always value them differently. If you find some really natural measure of hedons in computational space you can always map hedons to moral value with different functions. (E.g. map 1 hedon to 1 value, 2 hedons to 10 value, 3 hedons to 100 value...)
So I didn't focus on it here, but I think it's definitely good to think about the Disneyland concern and it's closely related to what I was thinking about when writing the OP.
I really liked @Joe_Carlsmith articulation of your 23-word summary: what if all people are paperclippers relative to one another? Though it does make stronger assumptions than we are here.
I don't understand why that matters. Whatever discount rate you have, if you're prioritizing between extinction risk and trajectory change you will have some parameters that tell you something about what is going to happen over N years. It doesn't matter how long this time horizon is. I think you're not thinking about whether your claims have bearing on the actual matter at hand.
It would probably be most useful for you to try to articulate a view that avoids the dilemma I mentioned in the first comment of this thread.
You're not going to be prioritizing between extinction risk and long term trajectory changes based on tractability if you don't care about the far future. And for any moral theory you can ask "why do you think this will be a good outcome?" and as long as you don't value life intrinsically you'll have to state some empirical hypotheses about the far future
I want to see a bargain solver for AI alignment to groups: a technical solution that would allow AI systems to solve the pie cutting problem for groups and get them the most of what they want, for AI alignment. The best solutions I've seen for maximizing long run value involve using a bargain solver to decide what ASI does, which preserves the richness and cardinality of people's value functions and gives everyone as much of what they want as possible, weighted by importance. (See WWOTF Afterwards, the small literature on bargaining-theoretic approaches to moral uncertainty.) But existing democratic approaches to AI alignment seem to not be fully leveraging AI tools, and instead aligning AI systems to democratic processes that aren't empowered with AI tools (e.g. CIPs and CAIS'S alignment to the written output of citizens' assemblies.) Moreover, in my experience the best way to make something happen is just to build the solution. If you might be interested in building this tool and have the background, I would love to try to connect you to funding for it.
For deeper motivation see here.
Here's a shower thought:
It won't work for every model (maybe the other parameters just won't budge), but for some of them it should.
(Low effort comment as I run out the door, but hope it adds value) To me the most compelling argument in favour of tractability is:
A cynical and oversimplified — but hopefully illuminating — view (and roughly my view) is that trajectory changes are just longterm power grabs by people with a certain set of values (moral, epistemic, or otherwise). One argument in the other direction is that lots of people are trying to grab power — it's all powerful people do! And conflict with powerful people over resources is a significant kind of non-neglectedness. But very few people are trying to control the longterm future, due to (e.g.) hyperbolic discounting. So on this view, neglectedness provisionally favours trajectory changes that don't reallocate power until the future, so that they are not in competition with people seeking power today. A similar argument would apply to other domains where power can be accrued but where competitors are not seeking power.
Sure, there are various ways to do this. Scale up ems, for example, or build superintelligence from symbolic systems with strong verifiability guarantees, for starters.