T

tobycrisford 🔸

728 karmaJoined
tobycrisford.github.io/

Comments
162

This is a really interesting idea. But does a system like this risk increasing the number of animals being farmed?

I'm struggling to wrap my head around the economics of it properly, but if you take it to extremes then it seems like it might?

Suppose I'm a wealthy individual who is willing to pay a very high price to keep a hen out of a cage (more than the market price of all of her eggs over her life). Now imagine that the demand for eggs drops to zero. In the current system, farmers would stop raising hens, and animal activists would be happy. But in this new system, if it is still legal to cage hens, then a farmer could keep raising hens, threaten to put them in cages unless I pay them not to, and just throw all their eggs in the bin? Or am I misunderstanding the system?

That's an extreme and unrealistic example, but just meant to illustrate that in principle paying for welfare (which is really paying someone not to do something bad) feels like it might carry a risk of increasing total amount of suffering, even if it decreases the average?

The chain of thought is still generated via feed-forward next token prediction, right?

 

Yes, it is.. But it still feels different to me.

If it's possible to create consciousness on a computer at all, then at some level it will have to consist of mechanical operations which can't by themselves be conscious. This is because you could ultimately understand what it is doing as a set of simple instructions being carried out on a processor. So although I can't see how a single forward pass through a neural network could involve consciousness, I don't think a larger system being built out of these operations should rule out that larger system being conscious.

In a non-reasoning model, each token in the output is generated spontaneously, which means I can't see how there could be any conscious deliberation behind it. For example, it can't decide to spend longer thinking about a hard problem than an easier one, in the way a human might. I find it hard to get my head around a conscioussness that can't do that.

In a reasoning model, none of this applies.

(Although it's true that the distinction probably isn't quite as clear cut as I'm making out. A non-reasoning model could still decide to use its output to write out "chain of thought" style reasoning, for example.)

I appreciate you're sharing this as an alternative way of framing AI alignment for people who react badly to using anthropomorphic language to describe LLMs, and I can see it could be useful from that point of view. But I strongly disagree with the core argument being made in that blogpost.

The problem with saying that LLMs are just functions mapping between large vector spaces, is that functions mapping between large vector spaces can do an awful lot! If the brain is just a physical system operating according to known laws of physics, then its evolution in time can also be described as a mapping from R^n -> R^m for some huge n and m, because that's the form that the laws of physics take as well. If the evolution of the universe is described by Schrodinger's equation, then all time-evolution is just matrix multiplication!

There might be very good reasons to think that LLMs are a long way from having human-like intelligence, but saying this follows because they are just a mathematical function is a misleading rhetorical sleight of hand.

I agree there is a non-negligible chance that existing LLMs are already conscious, and I think this is a really interesting and important discussion to have. Thanks for writing it up! I don't think I would put the chances as high as 10% though.

I don't find the Turing test evidence as convincing as you present it here. The paper you cited released their test online for people to try. I played it quite a lot, and I was always able to distinguish the human from the AI (they don't tell you which AI you are paired with, but presumably some of those were with GPT-4.5).

I think a kind of Turing test could be a good test for consciousness, but only if it is long, informed, and adversarial (e.g. as defined here: https://www.metaculus.com/questions/11861/when-will-ai-pass-a-difficult-turing-test/ ). This version of the test has not been passed (although as someone pointed out to me on the forum before, this was not Turing's original definition).

On the other hand, I don't find your strongest argument against LLM consciousness to be as convincing either. I agree that if each token you read is generated by a single forward pass through a network of fixed weights, then it seems hard to imagine how there could be any 'inner life' behind the words. There is no introspection. But this is not how the new generation of reasoning models work. They create a 'chain of thought' before producing an answer, which looks a lot like introspection if you read it!

I can imagine how something like an LLM reasoning model could become conscious. It's interesting that they didn't use any reasoning models in that Turing test paper!

Combining some empirical evidence with a subjective guess does not necessarily make the conclusion more robust if the subjective guess is on shaky ground. An argument may only be as strong as its weakest link.

I would not expect the subjective judgements involved in RP's welfare range estimates to be more robust than the subjective judgements involved in estimating the probability of an astronomically large future (or of the probability of extinction in the next 100 years).

I would reply by saying the likelihood of that is arbitrarily close to 0, although not exactly 0

 

I believe this is mathematically impossible! But probably not worth going back and forth on this.

I actually basically agree with your response to the Pascal mugger problem here. I'm still very uncertain, but I think I would endorse:

  • Making decisions by maximizing expected value, even when dealing with tiny objective probabilities.
  • Assigning lower prior probability to claims in proportion to the size of impact they claim I can have, to avoid decision paralysis when considering situations involving potential enormous value.
  • Assigning a probability of literally zero to any claim that says I can influence arbitrarily high amounts of value, or infinite amounts of value, at least for the purposes of making decisions (but drawing a distinction between a claim having zero subjective 'probability' and a claim being impossible).

But I think this approach makes me sceptical of the argument you are making here as well. You claim your argument is different to longtermism because it is based on empirical evidence (which I take it you're saying should be enough to override our prior scepticism of claims involving enormous value?), but I don't fully understand what you mean by that. To me, an estimate of the likelihood of humanity colonizing the galaxy (which is all strong longtermism is based on) seems as robust, if not more robust, than an estimate of the welfare range of a nematode.

For instance, I don't even know how you define units of welfare in a way that lets you make comparisons between a human and a nematode, let alone how you would go about measuring it empirically. I suspect it is likely impossible to define in a non-arbitrary way.

Might be missing something silly, but I think you're still dodging the question. There is no specific N in the claim I gave you. This magician is claiming that they have a spell that given any N, will create N beings.

So are you just saying you assign that claim zero probability?

I think you're trying to redefine the problem I gave you and I don't think you're allowed to do that.

In this problem the mugger is not threatening a specific number of beings, instead they are claiming to have a specific power. They are claiming that:

Given any positive whole number N, they can instantly and magically create N beings (and give them some high or low welfare).

You need to assign a probability to that claim. If it is anything other than literally zero, you will be vulnerable to mugging.

Thanks for the reply, I appreciate the distinction you're drawing between your arguments and strong longtermism, but I don't think that directly addressed my question?

In the version of the thought experiment I described, the mugger is claiming magical powers which would enable them to create arbitrarily many lives of arbitrary welfare. Unless you assign this claim a probability of literally zero, then you are vulnerable to mugging, because they can then adjust their threat to match your estimate.

Is that the way out you would take here?

@Vasco Grilo🔸 I have to admire your commitment to pursusing this kind of reasoning to wherever that leads. The evolution of your views on charity effectiveness over time has been fascinating, and I'm interested to see where it goes next as well.

I'm not sure if I've actually asked you this in a comment before, apologies if I have, but what is your reply to the Pascal mugger problem?

If I say to you:

"I am a sorceror from another dimension, who has the power to conjure arbitrary numbers of sentient beings at will. How likely do you think this claim is?"

Then whatever you reply, I say: "Give me your wallet, or I will threaten to torture/'give joy to' 10x the reciprocal of that probability estimate"... What do you say to this?

Load more