T

Technoliberal

13 karmaJoined

Comments
13

I'm thinking more of the "endgame" here, so I think the input from non-researchers is no more valuable than the input of the researchers (as in, any useful information you could obtain about AI safety can be obtained just from the researchers alone). To be specific, I believe something along the lines of AI 2027 is gonna be the somewhat-near future, so I wanna restrict access to advanced models as much as possible.

Think of it like nuclear bombs. If you had a technology that powerful, you wouldn't want to risk any bad actors getting access to it, so you limit the amount of owners as much as possible. It would be pretty ridiculous to want private companies to be able to own, or even use nuclear weapons, and I think the case is pretty similar for current and future AI.

This task, of trying to align them, is something that shouldn't just be left to researchers in AI companies.

 

Why? I would find an AI expert is much more suited to align a potential AGI than any common person. I just don't see how the common person could contribute to alignment. If anything, I can see how they would contribute to DISalignment (engineering better jailbreaks, using the models for nefarious purposes, giving the models "bad values" (like "cause as much damage as possible"), etc.). I think I value existential risk above all else, and I can't imagine publicly releasing "almost superhuman" models can decrease it.

You can't solve every possible jailbreak, but you should solve every jailbreak humanly possible if you're to release an AI that is claimed to be almost superhuman at cyber skills. I think current models are mostly bad for society, but I also think there's a possibility that current models could achieve AGI. Maybe it's only a 4% chance, but again, why take the risk? what is there to gain (other than money)?

I don't understand how publicly releasing these models will help in researching AI safety (and when I say "AI safety" I mostly mean AGI alignment). I thought the whole point of an aligned AGI is that you don't have to tell it to do stuff correctly, it already knows what's correct, even more than you, so I don't see how letting anyone use the models will help in aligning them. I'm not an AI expert or anything, but to me it seems aligning AGI is less of a "we don't have enough data" problem and more of a "we don't even know where to start" problem.

I understand that, but my point is that I thought these were AI safety companies, and would therefore prioritize AI safety above all else. If they don't, why do so many people still treat them as if they did?

I don't know, maybe eventually it could help, but with these "cutting edge" coding models doesn't it seem irresponsible? what if the safeguards don't work? shouldn't you release the model publicly only after you've exhaustively patched every single possible jailbreak? (even then I would argue it's still better to not release it, since billions of people means hundreds of thousands of bad actors, and again, as an AI safety company with "cutting edge" models I wouldn't take any risks)

So you're saying it's all BS? You're saying that Anthropic and OpenAI ultimately don't care about AI alignment? It sure seems like it for me, but browsing this website I have the feeling most people disagree with you

Am I missing something obvious? Don't Anthropic and OpenAI claim to be for AI safety research?

I had a question. Why do all the AI safety companies seem to do the opposite of AI safety? Anthropic keeps publicly releasing models (which means they can be accessed by billions of people), same for OpenAI, and while these models are unlikely to cause major problems, if you're releasing a product that is going to be used by billions of people you should make sure the product is around 99.9999% failure proof. Anthropic themselves have said "AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities" when referring to Mythos. Now sure, Fable is claimed to be "safe for general use", and maybe it is, but why take the risk? Especially after only around 2-3 months of safety testing? I would want a company that claims to be for AI safety to always err on the side of caution, but this frankly seems quite reckless.

If I had to be more specific I would mean "reducing the probability of all humanity (and only humanity) dying in a few short days/weeks from 50% to 10%" by "significantly reduce existential risk".

Also, I disagree with your methods. X risks aren't especially bad because of all the utility lost (and "negative utility" created), they're bad because after they happen there's never any utility again. Unless apes re-evolve into humans and reestablish all of civilization all over again, but we're getting too hypothetical. What's 100, or even 1000 years of death and suffering compared to 10000 of utopia? If stalling/slowing down technological progress for 1000 years made the P(Doom) go from 50% to 1%, I would definitely take it. Unless of course you think utopia is gonna be some short lived thing, but I seriously doubt that.

That's fair, but I imagine X risks and S risks are very heavily correlated. Especially in regards to "speed of progress", accelerationism will, in my view, obviously increase X risks (safety research takes time, the more time you have, the more time for research you have, the more research is done, therefore reducing risk) but also increase S risks (this is more personal opinion, but I don't think the current leaders of AI innovation have stuff like animal welfare in mind. if we just keep chugging along, the first ASI might not care about animals at all).

Load more