On Dichotomy:
Looking at the full article:
(Having just read the forum summary so far) I think there's a bunch of good exploration of arguments here, but I'm a bit uncomfortable with the framing. You talk about "if Maxipok is false", but this seems to me like a type error. Maxipok, as I understand it, is a heuristic: it's never going to give the right answer 100% of the time, and the right lens for evaluating it is how often it gives good answers, especially compared to other heuristics the relevant actors might reasonably have adopted.
Quoting from the Bostrom article you link:
...At best, maxipok is a
I think Eric has been strong about making reasoned arguments about the shape of possible future technologies, and helping people to look at things for themselves. I wouldn't have thought of him (even before looking at this link[1]) as particularly good on making quantitative estimates about timelines; which in any case is something he doesn't seem to do much of.
Ultimately I am not suggesting that you defer to Drexler. I am suggesting that you may find reading his material as a good time investment for spurring your own thoughts. This is something you can t...
I'm not sure. I think there are versions of things here which are definitely not convergence (straightforward acausal trade between people who understand their own values is of this type), but I have some feeling like there might be extra reasons for convergence from people observing the host, and having that fact feed into their own reflective process.
(Indeed, I'm not totally sure there's a clean line between convergence and trade.)
I think there's something interesting to this argument, although I think it may be relying on a frame where AI systems are natural agents, in particular at this step:
a strategically and philosophically competent AI should seemingly have its own moral uncertainty and pursue its own "option value maximization" rather than blindly serve human interests/values/intent
It's not clear to me why the key functions couldn't be more separated, or whether the conflict you're pointing persists across such separation. For instance, we might have a mix of:
My belief is that the Open Philanthropy Project, EA generally, and Oxford EA particularly, had bad AI timelines and bad ASI ruin conditional probabilities; and that these invalidly arrived-at beliefs were in control of funding, and were explicitly publicly promoted at the expense of saner beliefs.
There is a surprising amount of normative judgment in here for a fact check. Are you looking just for disagreements that people held roughly the beliefs you later outline (I think you overstate things but are directionally correct in describing how beliefs differe...
My view of the tragedy of OpenPhil is indeed that they were very earnest people trying to figure out what was legit, but ended up believing stuff like "biologically anchored estimates of AI timelines" that were facially absurd and wrong and ultimately self-serving, because the problem "end up with beliefs about AI timelines that aren't influenced by what plays well with our funders and friends" was hard and frankly out of their league and OpenPhil did not know that it was a hard problem or treat it with what I would consider seriousness.
If you'd like to vi...
Sorry, I didn't mean mislabelled in terms of having the labels the wrong way around. I meant that the points you describe aren't necessarily the ends of the spectrum -- for instance, worse than just losing all alignment knowledge is losing all the alignment knowledge while keeping all of the knowledge about how to build highly effective AI.
At least that's what I had in mind at the time of writing my comment. I'm now wondering if it would actually be better to keep the capabilities knowledge, because it makes it easier to do meaningful alignment work as you do the rerun. It's plausible that this is actually more important than the more explicitly "alignment" knowledge. (Assuming that compute will be the bottleneck.)
You're discussing catastrophes that are big enough to set the world back by at least 100 years. But I'm wondering if a smaller threshold might be appropriate. Setting the world back by even 10 years could be enough to mean re-running a lot of the time of perils; and we might think that catastrophes of that magnitude are more likely. (This is my current view.)
With the smaller setbacks you probably have to get more granular in terms of asking "in precisely which ways is this setting us back?", rather than just analysing it in the abstract. But that can just be faced.
Why do you think alignment gets solved before reasonably good global governance? It feels to me pretty up in the air which target we should be aiming to hit first. (Hitting either would help us with the other. I do think that we likely want to get important use out of AI systems before we establish good global governance; but that we might want to then do the governance thing to establish enough slack to take the potentially harder parts of alignment challenge slowly.)
On section 4, where you ask about retaining alignment knowledge:
For much of the article, you talk about post-AGI catastrophe. But when you first introduce the idea in section 2.1, you say:
the period from now until we reach robust existential security (say, stable aligned superintelligence plus reasonably good global governance)
It seems to me like this is a much higher bar than reaching AGI -- and one for which the arguments that we could still be exposed to subsequent catastrophes seem much weaker. Did you mean to just say AGI here?
Yeah roughly the thought is "assuming concentrated power, it matters what the key powerful actors will do" (the liberal democracy comment was an aside saying that I think we should be conditioning on concentrated power).
And then for making educated guesses about what the key powerful actors will do, it seems especially important to me what their attitudes will be at a meta-level: how they prefer to work out what to do, etc.
I might have thought that some of the most important factors would be things like:
(Roughly because: either power is broadly distributed, in which case your comments about liberal democracy don't seem to have so much bite; or it's not, in which case it's really the values of leadership that matter.) But I'm not sure you really touch on these. Interested if you have thoughts.
Thanks AJ!
My impression is that although your essay frames this as a deep disagreement, in fact you're reacting to something that we're not saying. I basically agree with the heart of the content here -- that there are serious failure modes to be scared of if attempting to orient to the long term, and that something like loop-preservation is (along with the various more prosaic welfare goods we discussed) essential for the health of even a strict longtermist society.
However, I think that what we wrote may have been compatible with the view that you have such a negative reaction to, and at minimum I wish that we'd spent some more words exploring this kind of dynamic. So I appreciate your response.
Ok thanks I think it's fair to call me on this (I realise the question of what Thiel actually thinks is not super interesting to me, compared to "does this critique contain inspiration for things to be aware of that I wasn't previously really tracking"; but get that most people probably aren't orienting similarly, and I was kind of assuming that they were when I suggested this was why it was getting sympathy).
I do think though that there's a more nuanced point here than "trying too hard to do good can result in harm". It's more like "over-claiming about ho...
I think that the theology is largely a distraction from the reason this is attracting sympathy, which I'd guess to be more like:
is that you feel that moral statements are not as evidently subjective as say, 'Vanilla ice-cream is the best flavor' but not as objective as, say 'An electron has a negative charge', as living in some space of in-betweeness with respect to those two extremes
I think that's roughly right. I think that they are unlikely to be more objective than "blue is a more natural concept than grue", but that there's a good chance that they're about the same as that (and my gut take is that that's pretty far towards the electron end of the spectrum; but perhaps I'm conf...
See my response to Manuel -- I don't think this is "proving moral realism", but I do think it would be pointing at something deeper and closer-to-objective than "happen to have the same opinions".
I'm not sure what exactly "true" means here.
Here are some senses in which it would make morality feel "more objective" rather than "more subjective":
Locally, I think that often there will be some cluster of less controversial common values like "caring about the flourishing of society" which can be used to derive something like locally-objective conclusions about moral questions (like whether X is wrong).
Globally, an operationalization of morality being objective might be something like "among civilizations of evolved beings in the multiverse, there's a decently big attractor state of moral norms that a lot of the civilizations eventually converge on".
Ok but jtbc that characterization of "affronted" is not the hypothesis I was offering (I don't want to say it wasn't a part of the downvoting, but I'd guess a minority).
I would personally kind of like it if people actively explored angles on things more. But man, there are so many things to read on AI these days that I do kind of understand when people haven't spent time considering things I regard as critical path (maybe I should complain more!), and I honestly find it's hard to too much fault people for using "did it seem wrong near the start in a way that makes it harder to think" as a heuristic for how deeply to engage with material.
Not sure quite what to say here. I think your post was valuable and that's why I upvoted it. You were expressing confusion about why anyone would disagree, and I was venturing a guess.
I don't think gentleness = ego service (it's an absence of violence, not a positive thing). But also I don't think you owe people gentleness. However, I do think that when you're not gentle (especially ontologically gentle) you make it harder for people to hear you. Not because of emotional responses getting in the way (though I'm sure that happens sometimes), but literally b...
By "ontologically ungentle" I mean (roughly) it feels like you're trying to reach into my mind and tell me that my words/concepts are wrong. As opposed to writing which just tells me that my beliefs are wrong (which might still be epistemically ungentle), or language which just provides evidence without making claims that could be controversial (gentle in this sense, kind of NVC-style).
I do feel a bit of this ungentleness in that opening paragraph towards my own ontology, and I think it put me more on edge reading the rest of the post. But as I said, I didn't disagree-vote; I was just trying to guess why others might have.
Right ... so actually I think you're just doing pretty well at this in the latter part of the article.
But at the start you say things like:
There’s this fantasy of easy, free support for the AI Safety position coming from what’s commonly called a “warning shot”. The idea is that AI will cause smaller disasters before it causes a really big one, and that when people see this they will realize we’ve been right all along and easily do what we suggest.
What this paragraph seems to do is to push the error-in-beliefs that you're complaining about down into the ver...
Honestly, maybe you should try telling me? Like, just write a paragraph or two on what you think is valuable about the concept / where you would think it's appropriate to be applying it?
(Not trying to be clever! I started trying to think about what I would write here and mostly ended up thinking "hmm I bet this is stuff Holly would think is obvious", and to the extent that I may believe you're missing something, it might be easiest to triangulate by hearing your summary of what the key points in favour are.)
I upvoted and didn't disagree vote the original post (and generally agree with you on a bunch of the object level here!); however, I do feel some urge-towards-expressing-disagreement, which is something like:
OK I see the model there.
I guess it's not clear to me if that should hold if I think that most experiment compute will be ~training, and most cognitive labour compute will be ~inference?
However, over time maybe more experiment compute will be ~inference, as it shifts more to being about producing data rather than testing architectures? That could push back towards this being a reasonable assumption. (Definitely don't feel like I have a clear picture of the dynamics here, though.)
hmm, I think I would expect different experience curves for the efficiency of running experiments vs producing cognitive labour (with generally less efficiency-boosts with time for running experiments). Is there any reason to expect them to behave similarly?
(Though I think I agree with the qualitative point that you could get a software-only intelligence explosion even if you can't do this with human-only research input, which was maybe your main point.)
Edit: I think that Neel's comment is basically just a better version of the stuff I was trying to say. (On the object level I'm a little more sympathetic than him to ways in which Mechanize might be good, although I don't really buy the story to that end that I've seen you present.)
Wanting to note that on my impressions, and setting aside who is correct on the object-level question of whether Mechanize's work is good for the world:
I feel like voicing that I centrally expect AI to continue to have bigger real-world impacts, but not get very weird until the 2030s. I think worlds where things go faster than that are a serious enough possibility to take seriously, but I think that the apparent zeitgeist suggests timelines which are a bit more aggressive than I think is justified.
The reason I want to say something is that I sort of suspect there are a bunch of people in a similar epistemic position -- where it doesn't seem like a priority to properly explore what % to put on craziness this decade; nor to get into big arguments about whether the zeitgeist is slightly off -- but for whom your comment might feel like something of a trap.
I guess I'm fairly sympathetic to this. It makes me think that voluntary safety policies should ideally include some meta-commentary about how companies view the purpose and value-add of the safety policy, and meta-policies about how updates to the safety policy will be made -- in particular, that it might be good to specify a period for public comments before a change is implemented. (Even a short period could be some value add.)
I appreciate the investigation here.
I'm not sure whether I agree that "quietly lowering the bar at the last minute so you can meet requirements isn't how safety policies are supposed to work". (Not sure I disagree; but going to try to articulate a case against).
I think in a world where you understand the risks well ahead of time of course this isn't how safety policies should work. In a world where you don't understand the risks well ahead of time, you can get more clarity as the key moments approach, and this could lead you to rationally judge that a lowe...
I appreciated you expressing this.
Riffing out loud ... I feel that there are different dynamics going on here (not necessarily in your case; more in general):
I wanted to share some insights from my reflection on my mistakes around attraction/power dynamics — especially something about the shape of the blindspots I had. My hope is that this might help to avert cases of other people causing harm in similar ways.
I don’t know for sure how helpful this will be; and I’m not making a bid for people to read it (I understand if people prefer not to hear more from me on this); but for those who want to look, I’ve put a couple of pages of material here.
These are in the same category because:
I'm not actually making a claim about alignment difficulty -- beyond that...
I agree there are some possible attitudes that society could have towards AI development which could put us in a much safer position.
I think that the degree of consensus you'd need for the position that you're outlining here is practically infeasible, absent some big shift in the basic dynamics. I think that the possible shifts which might get you there are roughly:
Much better epistemics and/or coordination -- out of reach now, put potentially obtainable with stronger tech.
Why are these the same category and why are you writing coordination off as impossible? It's not. We have literally done global nonproliferation treaties before.
This bizarre notion got embedded early in EA that technological feats are possible and solving coordination problems is impossible. It's actually the opposite-- alignment is not tractable and coordination is.
I agree that there could be an effect that keeps people from speaking out about AI danger. But:
Probably our crux is that I think the way society sees AI development morally is what matters here to navigate the straits, and the science is not going to be able to do the job in time. I care about developing a field of technical AI Safety but not if it comes at the expense of moral clarity that continuing to train bigger and bigger models is not okay before we know it will be safe. I would much rather rally the public to that message than try to get in the weak safety paper discourse game (which tbc I consider toothless and assume is not guiding Google’s strategy).
I downvoted this (but have upvoted some of your comments).
I think this advice is at minimum overstated, and likely wrong and harmful (at least if taken literally). And it's presented with rhetorical force, so that it seems to mostly be trying to push people's views towards a position that is (IMO) harmful, rather than mostly providing them with information to help them come to their own conclusions.
TBC:
Which applications to focus on: I agree that epistemic tools and coordination-enabling tools will eventually have markets and so will get built at some point absent intervention. But this doesn't feel like a very strong argument -- the whole point is that we may care about accelerating applications even if it's not by a long period. And I don't think that these will obviously be among the most profitable applications people could make (especially if you can start specializing to the most high-leverage epistemic and coordination tools).
Also, we could make a similar argument that "automated safety" research won't get dropped, since it's so obviously in the interests of whoever's winning the race.
UI and complementary technologies: I'm sort of confused about your claim about comparative advantage. Are you saying that there aren't people in this community whose comparative advantage might be designing UI? That would seem surprising.
More broadly, though:
Compute allocation: mostly I think that "get people to care more" does count as the type of thing we were talking about. But I think that it's not just caring about safety, but also being aware ahead-of-time of the role that automated research may have to play in this, and when it may be appropriate to hit the gas and allocate a lot of compute to particular areas.
Training data: I agree that the stuff you're pointing to seems worthwhile. But I feel like you've latched onto a particular type of training data, and you're missing important categories, e.g.:
It seems like "what can we actually do to make the future better (if we have a future)?" is a question that keeps on coming up for people in the debate week.
I've thought about some things related to this, and thought it might be worth pulling some of those threads together (with apologies for leaving it kind of abstract). Roughly speaking, I think that:
Ughh ... baking judgements about what's morally valuable into the question somehow doesn't seem ideal. Like I think it's an OK way to go for moral ~realists, but among anti-realists you might have people persistently disagreeing about what counts as extinction.
Also like: what if you have a world which is like the one you describe as an extinction scenario, but there's a small amount of moral value in some subcomponent of that AI system. Does that mean it no longer counts as an extinction scenario?
I'd kind of propose instead using the typology Will proposed...
Yep, I guess I'm into people trying to figure out what they think and which arguments seem convincing, and I think that it's good to highlight sources of perspectives that people might find helpful-according-to-their-own-judgement for that. I do think I have found Drexler's writing on AI singularly helpful on my inside-view judgements.
That said: absolutely seems good for you to offer counterarguments! Not trying to dismiss that (but I did want to explain why the counterargument wasn't landing for me).