Just stumbled upon this sequence and happy to have found it! There seems to be lots of analysis ripe for picking here.
Some thoughts on the strength of the grabbiness selection effect below. I’ll likely to come back to this to add further thoughts in the future.
One factor that seems to be relevant here is the number of pathways to technological completion. If we assume that the only civilisations that dominate the universe in the far future are the ones that have reached technological completion (seems pretty true to me), then tautologically, the dominating civilisations must be those who have walked the path to technological completion. Now imagine that in order to reach technological completion, you must tile 50% of the planets under your control with computer chips, but your value system means that you assign huge disvalue to tiling planets with computer chips*. As a result, you’ll refuse to walk the path to technological completion, and be subjugated or wiped out by the civilisations that did go forward with this action.
The more realistic example here is a future in which suffering subroutines are a necessary step towards technological completion, and so civilisations that disvalue suffering enough to not take this step will be dominated by civilisations that either (1) don’t care for suffering or (2) are willing to bite the bullet of creating suffering sub-routines in order to pre-emptively colonise their available resources.
So the question here is how many paths are there to technological completion? Technological completion could be like a mountain summit that is accessible from many directions - in that case, if your value system doesn’t allow you to follow one path, you can change course and reach the summit from the other direction. But if there’s just a single path with some steps that are necessary to take, then this will constrain the set of value systems that dominate the far future. Sketching out precedents for technological completion would be a first step to gaining clarity here.
*This value system is just for the thought experiment, I’m not claiming that it’s a likely one.
Yep, the variance of human worker teams should definitely be stressed. It’s plausible that a super team of hackers might have attack workloads on the scale of 100s to 1000s of hours , whereas for lower quality teams, this may be more like 100,000s of hours.
Thinking about it, I can probably see significant variance amongst AI systems due to various degrees of finetuning on cyber capabilities (though as you said, not as much variance as human teams). E.g: A capable foundational model may map to something like a 60th percentile hacker and so have attack workloads on the order of 10,000s of hours (like in this piece). A finetuned model might map to a 95th percentile hacker and so a team of these may have workloads on the scale of 1000s of hours.
Though 100s of hours seems more on the implausible side - I'm guessing this would require a very large team (100s) of very skilled hackers.
And other relevant skills, like management
Kurzgesagt script + Melody Sheep music and visuals = great video about the long term future. Someone should get a colab between the two going
Intuitively I feel that this process does generalise, and I would personally be really keen to read case studies of an idea/strategy that was moved from left to right in the diagram above. i.e a thinker initially identifies a problem, and over the following years or decades it moves to tactics research, then policy development, then advocacy and finally is implemented. I doubt any idea in AI governance has gone through the full strategy-to-implementation lifecycle, but maybe one in climate change, nuclear risk management, or something else has? Would appreciate if anyone could link case studies of this sort!
I’m trying to get a better picture in my head of what good work looks like in this space - i.e, existing work that has given us improved strategic clarity. This could be with regards to TAI itself, or a technology such as nuclear weapons.
Imo, some examples of valuable strategic work/insights are:
I’m curious as to any other examples of existing work that you think fit into the category of valuable strategic work, of the type that you talk about in this post.
This is a really comprehensive post on pursuing a career in technical AI safety, including how to test fit and skill up
From listening to your podcast with Ali Abdaal, it seems that you're relatively optimistic about humanity being able to create aligned AI systems. Could you explain the main reasons behind your thinking here?
Strongly second this^