OscarD🔸

2056 karmaJoined Working (0-5 years)Oxford, UK

Comments
335

Yep, that all makes sense, and I think this work can still tell us something, just it doesn't update me too much given the lack of compelling theories or much consensus in the scientific/philosophical community. This is harsher than what I actually think, but directionally, it has the feel of 'cargo cult science' where it has a fancy Bayesian model and lots of numbers and so forth, but if it all built on top of philosophical stances I don't trust then it doesn't move me much. But that said it is still interesting e.g. how wide the range for chickens is.

OscarD🔸
3
0
0
60% agree

Most areas of capabilities research receive a 10x speedup from AI automation before most areas of safety research

The biggest factors seem to me to be feedback quality/good metrics and AI developer incentives to race

Nice! It strikes me that in figure 1, information is propagating upward, from indicator to feature to stance to overall probability, and so the arrows should also be pointing upward.
I think the view (stance?) I am most sympathetic to is that all our current theories of consciousness aren't much good, so we shouldn't update very far away from our prior, but that picking a prior is quite subjective, and so it is hard to make collective progress on this when different people might just have quite different priors for P(current AI consciousness).

I have Thoughts about the rest of it, which I am not sure whether I will write up, but for now: I am sad for your Dad's death and glad you got to prioritise spending some time with him.

I expect there is a fair bit we disagree about, but thanks for your integrity and effort and vision.

Perhaps the main downside is people may overuse the feature and it encourages people to spend time making small comments, whereas the current system nudges people towards leaving fewer more substantive comments and less nit-picky ones? Not sure if this has been an issue on LW, I don't read it as much.

I think the value of work on climate change isn't very impacted by this analysis, since it seems almost certianly solved post-ASI, so climate-driven catastrophic setbacks will generally occur pre-ASI and so not increase the total number of times we need to try to align ASI. Whereas nukes are less certainly solved post-ASI, given we may still be in a multipolar war-like world.

In Appendix A.1, it's not clear to me that an absolute reduction is the best way of thinking about this. Perhaps it is more natural to think in relative reductions? I suppose some interventions could probably best be modelled as absolute reductions (e.g. asteroid or supervolcano itnerventions perhaps) and others as relative reductions (doubling the amount of resources spent on alignment research?).

Yes, I think the '100 years' criterion isn't quite what we want. E.g. if there is a catastrophic setback more than 100 years after we build an aligned ASI, thenw e don't need to rerun the alignment problem. (In practice, perhaps 100 years should be ample time to build good global governance and reduce catastrophic setback risk to near 0, but conceptually we want to clarify this.)

And I agree with Owen that shorter setbacks also seem important. In fact, in a simple binary model we could just define a catastrophic setback to be one that takes you from a society that has built aligned ASI to one where all aligned ASIs are destroyed. ie the key thing is not how many years back you go, but whether you regres back beneath the critical 'crunch time' period.

I like the idea, though I think a shared gdoc is far better for any in-line comments. Maybe if you only want people to give high-level comments this is better though - I imagine heaps of people may want to comment on gdocs you share publicly.

Load more