Pato's Quick takes

Pato

This is a special post for quick takes by Pato. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Sorted by

New & upvoted

Click to highlight new quick takes since: Today at 5:41 AM

PatoOct 28 20222

I'm still learning basic things about AI Alignment, but it seems to me that all AIs (and other technologies) already don't give us exactly what we want but we don't call that outer misaligned because they are not "agentic" (enough?). The thing is that I don't know if there's a crucial? onthologic? property that make something agentic really, I think it could be just some type of complexity that we give a lot of value to.

And also ML system are inner misaligned in a way because they can't generalize to everything from examples and we can see that when we don't like the results to a particular task that they give us. I don't think misaligned is maybe the word for these technologies, but really the important thing is that they don't do what we want them to do.

So the question about AI risk really is: are we going to build a superintelligent technology? Because that is the significant difference with the previous technologies. If that's the case, we are not going to be the ones influencing the future the most, building little by little what we actually want and stopping the use of technologies whenever they aren't useful. We are going to be the ones turned off.

Daniel_EthOct 28 20223

"it seems to me that all AIs (and other technologies) already don't give us exactly what we want but we don't call that outer misaligned because they are not "agentic" (enough?)"
Just responding to this part – my sense is most of the reason that current systems don't do what we want has to do with capabilities failures, not alignment failures. That is, it's less about the system being given the wrong goal/doing goal misgeneralizing/etc, but instead simply not being competent enough.

PatoOct 28 20221

Wow, I didn't expected a response. I didn't know shortforms were that accessible and I thought I was just rambling in my profile. So I should clarify that when I say "what we actually want" I mean our actual terminal goals (if we have those).

So what I'm saying is that we are not training AIs or creating any other technology to do our terminal goals but to do other things (of course they're specific because they don't have high capabilities). But in the moment that we create something that can take over the world, all of the sudden the fact that we didn't create it to do our terminal goals becomes a problem.

I'm not trying to explain why present technologies have failures, but that misalignment is not something that appears with the creation of powerful AIs but that that is the moment when it becomes a problem, and that's why you have to create it with a different mentality than any other technology.