MZ

Micah Zoltu

17 karmaJoined Apr 2023

Posts
1

Sorted by New

Comments
8

How do we choose which human gets aligned with?

Is everyone willing to accept that "whatever human happens to build the hard takeoff AI gets to be the human the AI is aligned with"? Do AI alignment researchers realize this human may not be them, and may not align with them? Are AI alignment researchers all OK with Vladimir Putin, Kim Jong Un, or Xi Jinping being the alignment target? What about someone like Ted Kaczynski?

If the idea is "we'll just decide collectively", then in the most optimistic scenario we can assume (based on our history with democracy) that the alignment target will be something akin to today's world leaders, none of whom I would be comfortable having an AI aligned with.

If the plan is "we'll decide collectively, but using a better mechanism than every current existing mechanism" then it feels like there is an implication here that not only can we solve AI alignment but we can also solve human alignment (something humans have been trying and failing to solve for millennia).


Separately, I'm curious why my post got downvoted on quality (not sure if you or someone else). I'm new to this community so perhaps there is some rule I unintentionally broke that I would like to be made aware of.

I believe PornHub is a bigger company than most of today's AI companies (~150 employees, half software engineers according to Glass Door)? If Brave AI is to be believed, they have $100B in annual revenue and handle 15TB of uploads per day.

If this is the benchmark for the limits of an AI company in a world where AI research is stigmatized, then I am of the opinion that all that stigmatization will accomplish is to make it so people who are OK working in the dark get to make decisions on what gets built. I feel like PornHub sized companies are big enough to produce AGI.

I agree with you that Porn is a very distributed industry overall, and I do suspect that is partially because of the stigmatization. However, this has resulted in a rather robust organization arrangement where individuals work independently and these large companies (like PornHub) focus on handling the IT side of things.

In a stigmatized AI future, perhaps individuals all over the world will work on different pieces of AI stuff while a small number of big AI companies perhaps do bulk training or coordination. Interestingly, this sort of decentralized approach to building could result in a better AI outcome because we wouldn't end up with a small number of very powerful people deciding trajectory, and instead would have a large number of individuals working independently and in competition with each other.

I do like your idea about comparing to other stigmatized industries! Gambling and drugs are, of course, other great examples of how an absolutely massive industry can grow in the face of weak stigmatization!

I think your reasoning here is sound, but we have what I believe is a strong existence proof that when there is money to be made weak stigma doesn't do much:

Porn.

I think the porn industry fits nicely into your description of a weakly stigmatized industry, yet it is a booming industry that has many smart/talented people working in it even though it is weakly stigmatized.

If we are all correct, AI will be bigger (in terms of money) than the porn industry (which is huge) and I suspect demand will be higher than for porn. People may use VPNs and private browsers when using AIs, but it won't stop them I don't think.

I agree that generating outrage can happen pretty quickly. My claim here is that the level of universality required to meaningfully hinder AI development needs to be far higher than any of the examples you have given or any I can think of. You need a stigma as strong as something like incest or child molestation. One that is near universally held and very strongly enforced at the social layer, to the point that it is difficult to find any other humans who will even talk to you about the subject.

With crypto, COVID-19, and CRISPR there are still very large communities of people who are in opposition to the outraged individuals and who continue to make significant progress/gains against the outraged groups.

I think "time to prepare society for what is coming" is a much more sound argument than "try to stop AI catastrophe".

I'm still not a fan of the deceleration strategy, because I believe that in any potential future where AGI doesn't kill us it will bring about a great reduction in human suffering. However, I can definitely appreciate that this is very far from a given and it is not at all unreasonable to believe that the benefits provided by AGI may be significantly or fully offset by the negative impact of removing the need for humans to do stuff!

whatever benefits AI might bring in the future will still be available in a century, or a millennium, as long as humanity survives. That tree full of golden apples will still be there for the plucking

In the Foundation series, I believe Isaac Asimov expressed the counterargument to this quite well: ||It is fine to take the conservative route if we are alone in the Universe. If we are not alone in the universe, then we are in an existential race and just haven't met the other racers yet.||


I agree that 'AI alignment' is probably impossible, for the reasons you described, plus many others.

The main downside is that current generations might not get some of the benefits of early AI development.

How do you reconcile these two points? If the chance of alignment is epsilon, and deceleration results in significant unnecessary deaths/suffering in the very near future, it feels like you would essentially have to have zero discount on future utility to decide to choose deceleration?


Humans have many other ways to stigmatize, cancel, demonize, and ostracize behaviors that they perceive as risky and evil.

I think this is a good/valid point. However, I weakly believe that this sort of cultural stigmatization takes a very long time to build up to the levels necessary for meaningfully slowing AI research and I don't think we have the time to do that. I suspect a weak stigma (one that isn't shared by society as a whole) is more likely to just lead to conflict and bloodshed than to actually stopping advancement in the way we would need it to.

In theory, the best way to be the best next-word-predictor is to model humans.  Internally, humans model the world they live in.  A sufficiently powerful human modeler would likely model the world the humans live in.  Further, humans reason and so a really good next-word-predictor would be able to more accurately predict the next word by reasoning.  Similarly, it is an optimization strategy to develop other cognitive abilities, logic, etc.

All of this allows you to predict the correct next word with less "neurons" because it takes fewer neurons to learn how to do logical deduction and memorize some premises than it takes to memorize all of the possible outputs that some future prompt may require.

The fact that we train on human data just means that we are training the AI to be able to reason and critically think in the same way we do.  Once it has that ability, we can then "scale it up", which is something humans really struggle with.

How can you align AI with humans when humans are not internally aligned?

AI Alignment researchers often talk about aligning AIs with humans, but humans are not aligned with each other as a species.  There are groups whose goals directly conflict with each other, and I don't think there is any singular goal that all humans share.

As an extreme example, one may say "keep humans alive" is a shared goal among humans, but there are people who think that is an anti-goal and humans should be wiped off the planet (e.g., eco-terrorists).  "Humans should be happy" is another goal that not everyone shares, and there are entire religions that discourage pleasure and enjoyment.

You could try to simplify further to "keep species around" but there are some who would be fine with a wire-head future while others are not, and some who would be fine with humans merely existing in a zoo while others are not.


Almost every time I hear alignment researchers speak about aligning humans with AI, they seem to start with a premise that there is a cohesive worldview to align with.  The best "solution" to this problem that I have heard suggested is that there should be multiple AIs that compete with each other on behalf of different groups of humans or perhaps individual humans, and each would separately represent the goals of those humans.  However, the people who suggest this strategy are generally not AI Alignment researchers but rather people arguing against AI alignment researchers.

What is the implied alignment target that AI alignment researchers are trying to work towards?