S

Sharmake

937 karmaJoined

Comments
313

Topic contributions
2

Are you saying AIs trained this way won’t be agents?

Not especially. If I had to state it simply, it's that massive space for instrumental goals isn't useful today, and plausibly in the future for capabilities, so we have at least some reason to not worry about misalignment AI risk as much as we do today.

In particular, it means that we shouldn't assume instrumental goals to appear by default, and to avoid overrelying on non-empirical approaches like your intuition or imagination. We have to take things on a case-by-case basis, rather than using broad judgements.

Note that instrumental convergence/instrumental goals isn't a binary, but rather a space, where more space for instrumental goals being useful for capabilities is continuously bad, rather than a sharp binary of instrumental goals being active or not active.

My claim is that the evidence we have is evidence against much space for instrumental convergence being useful for capabilities, and I expect this trend to continue, at least partially as AI progresses.

Yet I suspect that this isn't hitting at your true worry, and I want to address it today. I suspect that your true worry is this quote below:

And regardless of whatever else you’re saying, how can you feel safe that the next training regime won’t lead to instrumental convergence?

And while I can't answer that question totally, I'd like to suggest going on a walk, drinking water, or in the worst case getting mental help from a professional. But try to stop the loop of never feeling safe around something.

The reason I'm suggesting this is because the problem with acting on your need to feel safe is that the following would happen:

  1. This would, if adopted leave us vulnerable to arbitrarily high demands for safety, possibly crippling AI use cases, and as a general policy I'm not a fan of actions that would result in arbitrarily high demands for something, at least without scrutinizing it very heavily, and would require way, way more evidence than just a feeling.

  2. We have no reason to assume that people's feelings of safety or unsafety actually are connected to the real evidence of whether AI is safe, or whether misalignment risk of AI is big problem. Your feelings are real, but I don't trust that your feeling of unsafety of AI is telling me anything else other than your feelings about something. This is fine, to the extent that it isn't harming you materially, but it's an important thing to note here.

Kaj Sotala made a similar post, which talks about why you should mostly feel safe. It's a different discussion than my comment, but the post below may be useful:

https://www.lesswrong.com/posts/pPLcrBzcog4wdLcnt/most-people-should-probably-feel-safe-most-of-the-time

EDIT 1: I deeply hope you can feel better, no matter what happens in the AI space.

EDIT 2: One thing to keep in mind in general is that in typical cases, when claims that something is more or less anything based on x evidence, this is usually smoothly less or more, rather than something going to zero of something or all of something, so in this case I'm claiming that AI is less dangerous, probably a lot less dangerous, but it doesn't mean we totally erase the danger, it just means that things are more safe and things have gotten smoothly better based on our evidence to date.

Sharmake
1
0
0
21% disagree

The big reason I lean towards disagreeing nowadays is coming to the belief that I expect the AI control/alignment problem to be much less neglected and important to solve, and more generally I've come to doubt the assumption that worlds in which we survive are worlds in which we achieve very large value (under my own value set), such that reducing existential risk is automatically good.

Late comment, I basically agree with the point being made here that we should avoid committing a fallacy of assuming work done is constant/lump of labor fallacies, but I don't think this weakens the argument that human work will be replaced by AI work totally, for 2 reasons:

  1. In a world where you can copy AI labor hugely readily, wages fall for the same reason why prices fall when more goods are supplied, and in particular humans have a biological minimum wage of 20-100 watts that fundamentally makes them unemployable once AIs can be run for cheaper than this, and human wages are likely to fall below subsistence if AIs are copied hugely.

  2. While more work will happen from growing the economy, it is still better to invest in AIs to do the work than it is to invest in humans, and thus even while labor grows, human labor specifically can fall to essentially 0, so the automation hypothesis is at least a consistent hypothesis to hold economically.

To be honest, even if we grant the assumption that AI alignment is achieved and it matters who achieves AGI/ASI, I'd be much, much less confident in America racing, and think that it's weakly negative to race.

One big reason for this is that the pressures AGI introduces are closer to cross-cutting pressures than pressures that are dependent on nations, like the intelligence curse sort of scenario where elites have incentives to invest in their automated economy, and leave the large non-elite population to starve/be repressed:

https://lukedrago.substack.com/p/the-intelligence-curse

I think this might not be irrationality, but a genuine difference in values.

In particular, I think something like a discount rate disagreement is at the core of a lot of disagreements on AI safety, and to be blunt, you shouldn't expect convergence unless you successfully persuade them of this.

I ultimately decided to vote for the animal welfare groups, because I believe that animal welfare, in both it's farmed and wild variants is probably one of the most robust and large problems in the world, and with the exception of groups that are the logistical/epistemic backbone of the movements (they are valuable for gathering data and making sure that the animal welfare groups can do their actions), I've become more skeptical that other causes were robustly net-positive, especially reducing existential risks.

This sounds very much like the missile gap/bomber gap narrative, and yeah this is quite bad news if they actually adopt the commitments pushed here.

The evidence that China is racing to AGI is quite frankly very little, and I see a very dangerous arms race that could come:

https://forum.effectivealtruism.org/posts/cXBznkfoPJAjacFoT/are-you-really-in-a-race-the-cautionary-tales-of-szilard-and

I honestly agree with this post, and to best translate this into my own thinking, we should rather have AI that is superhuman at faithful COT reasoning than it is at wise forward pass thinking.

The Peter Singer/Einstein/Legible reasoning corresponds to COT reasoning, whereas a lot of the directions for intuitive wise/illegible thinking depend on making the forward pass thinking more capable, which is not a great direction for reasons of trust and alignment.

In retrospect, I agree more with 3, and while I do still think AI timelines are plausibly very short, I do think that after-2030 timelines are reasonably plausible from my perspective.

I have become less convinced that takeoff speed from the perspective of the state will be slow, slightly due to entropix reducing my confidence in a view where algorithmic progress doesn't suddenly go critical and make AI radically better, and more so because I now think there will be less flashy/public progress, and more importantly I think the gap between consumer AI and internal AI used in OpenAI will only widen, so I expect a lot of the GPT-4 moments where people wowed and got very concerned at AI to not happen again.

So I expect the landscape of AI governance to have less salience when AIs can automate AI research than the current AI governance field thinks, which means overall I've reduced my probability of a strong societal response from say 80-90% likely to only 45-60% likely.

This is mostly correct as a summary of my position, but for point 6, I want to point out while this is technically true, I do fear economic incentives are against this path.

Agree with the rest of the summary though.

Load more