158 karmaJoined


Yes. Thanks. Link has been amended. Author was in fact Luke Muehlhauser, so labeling it 'WEF' is only partially accurate.

I agree that more of both is needed. Both need to be instantiated in actual code, though. And both are useless if researchers don't care implement them.

I admit I would benefit from some clarification on your point - are you arguing that the article assumes a bug-free AI won't cause AI accidents? Is it the case that this arose from Amodei et al.'s definition?: “unintended and harmful behavior that may emerge from poor design of real-world AI systems”. Poor design of real world AI systems isn't limited to being bug-free, but I can see why this might have caused confusion.

I don't think it's an implausible risk, but I also don't think that it's one that should prevent the goal of a better framing.

AI accidents brings to my mind trying to prevent robots crashing into things. 90% of robotics work could be classed as AI accident prevention because they are always crashing into things.

It is not just funding confusion that might be a problem. If I'm reading a journal on AI safety or taking a class on AI safety what should I expect? Robot mishaps or the alignment problem? How will we make sure the next generation of people can find the worthwhile papers/courses?

I take the point. This is a potential outcome, and I see the apprehension, but I think it's a probably a low risk that users will grow to mistake robotics and hardware accidents for AI accidents (and work that mitigates each) - sufficiently low that I'd argue expected value favours the accident frame. Of course, I recognize that I'm probably invested in that direction.

Perhaps we should take a hard left and say that we are looking at studying Artificial Intelligence Motivation? People know that an incorrectly motivated person is bad and that figuring out how to motivate AIs might be important. It covers the alignment problem and the control problem.

Most AI doesn't look like it has any form of motivation and is harder to rebrand as such, so it is easier to steer funding to the right people and tell people what research to read.

I think this steers close to an older debate on AI “safety” vs “control” vs “alignment”. I wasn't a member of that discussion so am hesitant to reenact concluded debates (I've found it difficult to find resources on that topic other than what I've linked - I'd be grateful to be directed to more). I personally disfavour 'motivation' on grounds of risk of anthropomorphism.

"permanent loss of control of a hostile AI system" - This seems especially facilitative of the science-fiction interpretation to me.

I agree with the rest.

I think this proposition could do with some refinement. AI safety should be a superset of both AGI safety and narrow-AI safety. Then we don't run into problematic sentences like "AI safety may not help much with AGI Safety", which contradicts how we currently use 'AI safety'.

To address the point on these terms, then:

I don't think AI safety runs the risk of being so attractive that misallocation becomes a big problem. Even if we consider risk of funding misallocation as significant, 'AI risk' seems like a worse term for permitting conflation of work areas.

Yes, it's of course useful to have two different concepts for these two types of work, but this conceptual distinction doesn't go away with a shift toward 'AI accidents' as the subject of these two fields. I don't think a move toward 'AI accidents' awkwardly merges all AI safety work.

But if it did: The outcome we want to avoid is AGI safety getting too little funding. This outcome seems more likely in a world that makes two fields of N-AI safety and AGI safety, given the common dispreference for work on AGI safety. Overflow seems more likely in the N-AI Safety -> AGI Safety direction when they are treated as the same category than when they are treated as different. It doesn't seem beneficial for AGI safety to market the two as separate types of work.

Ultimately, though, I place more weight on the other reasons why I think it's worth reconsidering the terms.

What do you have in mind? If it can't be fixed with better programming, how will they be fixed?

Hi Carrick,

Thanks for your thoughts on this. I found this really helpful and I think 80'000 hours could maybe consider linking to it on the AI policy guide.

Disentanglement research feels like a valid concept, and it's great to see it exposed here. But given how much weight pivots on the idea and how much uncertainty surrounds identifying these skills, it seems like disentanglement research is a subject that is itself asking for further disentanglement! Perhaps it could be a trial question for any prospective disentanglers out there.

You've given examples of some entangled and under-defined questions in AI policy and provided the example of Bostrom as exhibiting strong disentanglement skills; Ben has detailed an example of an AI strategy question that seems to require some sort of "detangling" skill; Jade has given an illuminative abstract picture. These are each very helpful. But so far, the examples are either exclusively AI strategy related or entirely abstract. The process of identifying the general attributes of good disentanglers and disentanglement research might be assisted by providing a broader range of examples to include instances of disentanglement research outside of the field of AI strategy. Both answered and unanswered research questions of this sort might be useful. (I admit to being unable to think of any good examples right now)

Moving away from disentanglement, I've been interested for some time by your fourth, tentative suggestion for existing policy-type recommendations to

fund joint intergovernmental research projects located in relatively geopolitically neutral countries with open membership and a strong commitment to a common good principle.

This is a subject that I haven't been able to find much written material on - if you're aware of any I'd be very interested to know about it. It isn't completely clear whether or how to push for an idea like this. Additionally, based on the lack of literature, it feels like this hasn't received as much thought as it should, even in an exploratory sense (but being outside of a strategy research cluster, I could be wrong on this). You mention that race dynamics are easier to start than stop, meanwhile early intergovernmental initiatives are one of the few tools that can plausibly prevent/slow/stop international races of this sort. These lead me to believe that this 'recommendation' is actually more of a high priority research area. Exploring this area appears robustly positive in expectation. I'd be interested to hear other perspectives on this subject and to know whether any groups or individuals are currently working/thinking about it, and if not, how research on it might best be started, if indeed it should be.

Hey kbog, Thanks for this. I think this is well argued. If I may, I'd like to pick some holes. I'm not sure if they are sufficient to swing the argument the other way, but I don't think they're trivial either.

I'm going to use autonomy in weapons systems in favour of LAWs for reasons argued here(see Takeaway 1).

As far as I can tell, almost all considerations you give are to inter-state conflict. The intra-state consequences are not explored and I think they deserve to be. Fully autonomous weapons systems potentially obviate the need for a mutually beneficial social contract between the regimes in control of the weapons and the populations over which they rule. All dissent becomes easy to crush. This is patently bad in itself, but it also has consequences for interstate conflict; with less approval needed to go to war, inter-state conflict may increase.

The introduction of weapons systems with high degrees of autonomy poses an arguably serious risk of geopolitical turbulence: it is not clear that all states will develop the capability to produce highly autonomous weapons systems. Those that do not will have to purchase them from technologically-more advanced allies willing to sell them. States that find themselves outside of such alliances will be highly vulnerable to attack. This may motivate a nontrivial reshuffling of global military alliances, the outcomes of which are hard to predict. For those without access to these new powerful weapons, one risk mitigation strategy is to develop nuclear weapons, potentially motivating nuclear proliferation.

On your point:

The logic here is a little bit gross, since it's saying that we should make sure that ordinary soldiers like me die for the sake of the greater good of manipulating the political system and it also implies that things like body armor and medics should be banned from the battlefield, but I won't worry about that here because this is a forum full of consequentialists and I honestly think that consequentialist arguments are valid anyway.

My argument here isn't hugely important but I take some issue with the analogies. I prefer thinking in terms of both actors agreeing on acceptable level of vulnerability in order to reduce the risk of conflict. In this case, a better analogy is to the Cold War agreement not to build comprehensive ICBM defenses, an analogy which would come out in favour of limiting autonomy in weapons systems. But neither of us are placing much importance on this point overall.

I'd like to unpack this point a little bit:

Third, you might say that LAWs will prompt an arms race in AI, reducing safety. But faster AI development will help us avoid other kinds of risks unrelated to AI, and it will expedite humanity's progress and expansion towards a future with exponentially growing value. Moreover, there is already substantial AI development in civilian sectors as well as non-battlefield military use, and all of these things have competitive dynamics. AGI would have such broad applications that restricting its use in one or two domains is unlikely to make a large difference; after all, economic power is the source of all military power, and international public opinion has nontrivial importance in international relations, and AI can help nations beat their competitors at both.

I believe discourse on AI risks often conflates 'AI arms race' with 'race to the finish'. While these races are certainly linked, and therefore the conflation justified in some senses, I think it trips up the argument in this case. In an AI arms race, we should be concerned about the safety of non-AGI systems, which may be neglected in an arms race scenario. This weakens the argument that highly autonomous weapons systems might lead to fewer civilian casualties, as this is likely the sort of safety measure that might be neglected when racing to develop weapons systems capable of out-doing the ever more capable weapons of one's rival.

The second sentence only holds if the safety issue is solved, so I don't accept the argument that it will help humanity reach a future exponentially growing in value (at least insofar as we're talking about the long run future, as there may be some exponential progress in the near-term).

It could simply be my reading, but I'm not entirely clear on the point made across the third and fourth sentences, and I don't think they give a compelling case that we shouldn't try to avoid military application or avoid exacerbating race dynamics.

Lastly, while I think you've given a strong case to soften opposition to advancing autonomy in weapons systems, the argument against any regulation of these weapons hasn't been made. Not all actors seek outright bans, and I think it'd be worth acknowledging that (contrary to the title) there are some undesirable things with highly autonomous weapons systems and that we should like to impose some regulations on them such as, for example, some minimum safety requirements that help reduce civilian casualties.

Overall, I think the first point I made should cause serious pause, and it's the largest single reason I don't agree with your overall argument, as many good points as you make here.

(And to avoid any suspicions: despite arguing on his side, coming from the same city, and having the same rare surname, I am of no known relation to Noel Sharkey of the Stop Killer Robots Campaign, though I confess a pet goal to meet him for a pint one day.)

Not sure if it's just me but the board_setup.jpg wouldn't load. I'm not sure why, so I'm not expecting a fix, just FYI. Cards look fun though!

Load more