A

Arepo

5121 karmaJoined

Participation
1

Sequences
4

EA advertisements
Courting Virgo
EA Gather Town
Improving EA tech work

Comments
738

Topic contributions
17

I don't think revealed preferences make philosophical sense in any context. If the enitity in question has an emotional reaction to its preference then that emotional reaction seems like an integral part of what matters. If it has no such emotional reaction then it seems presumptive to the point of being unparsable to say that it was revealing a preference for 'not swarming' vs, say 'staying with an uncoordinated group that can therefore never spontaneously leave' or still more abstract notions.

I don’t think I need to have better access to someone’s values to make a compelling case. For instance, suppose I’m running a store and someone breaks in with a gun and demands I empty the cash register. I don’t have to know what their values are better than they do to point out that they are on lots of security cameras, or that the police are on their way, and so on. It isn’t that hard to appeal to people’s values when convincing them. We do this all the time.

This is option 1: 'Present me with real data that shows that on my current views, it would benefit me to vote for them'. Sometimes it's available, but usually it isn't.

Even if that foreclosed one mode of persuasion, well, too bad! That’s how reality is.

'Too bad! That's how reality is' is analogous to the statement 'too bad! That's how morality is' in its lack of foundation. 'Reality' and 'truth' are not available to us. What we have is a stream of valenced sensory input whose nature seems to depend somewhat on our behaviours. In general, we change our behaviour in such a way to to get better valenced sensory input, such as 'not feeling cognitive dissonance', 'not being in extreme physical pain', 'getting the satisfaction of symbols lining up in an intuitive way', 'seeing our loved ones prosper' etc. 

At a 'macroscopic' level, this sensory input generally resolves into mental processes approximately like '"believing" that there are "facts", about which our beliefs can be "right" or "wrong"' and 'it's generally better to be right about stuff', and 'logicians,  particle physicists and perhaps hedge fund managers are generally more right about stuff than religious zealots'. But this is all ultimately pragmatic. If cognitive dissonance didn't feel good us, and if looking for consistency in the world didn't seem to lead to nicer outcomes, we wouldn't do it, and we wouldn't care about rightness - and there's no fundamental sense in which it would be correct or even meaningful to say that we were wrong.

I'm not sure this matters for the question of how reasonable we should think antirealism is - it might be a few levels less abstract than such concerns. But I don't think it's entirely obvious that it doesn't, given the vagary to which I keep referring about what it would even mean for either moral pro-or-anti-realism to be correct. It might turn out that the least abstract principle we can judge it by is how we feel about its sensory consequences.

…carries the pragmatic implication that antirealists are more likely to be immoral people that threaten or manipulate others. Do you agree?

Eh, compared to who? I think most people are neither realists nor antirealists since they haven't built the linguistic schema for either position to be expressable (I'm claiming that it's not even possible to do so, but that's neither here nor there). So antirealists are obviously heavily selected to be a certain type of nerd, which probably biases their population towards a generally nonviolent, relatively scrupulous and perhaps affable disposition. 

But selection is different than cause, and I would guess among that nerdgroup, being utilitarian tends to cause one to be fractionally more likely to promote global utility, being contractualist tends to cause one to be fractionally more likely to uphold the social contract etc (I'm aware of the paper arguing that moral philosophers don't seem to be particularly moral, but that was hardly robust science. And fwiw it vaguely suggests that older books - which bias heavily nonutilitarian - tempted more immorality). 

The alternative is to believe that such people are all completely uninfluenced by their phenomenal experience of 'belief' in those philosophies, or that many of them are lying about having it (plausible, but leaves open the question of the effects of belief on behaviour of the ones who aren't), or some othersuch surprising disjunction between their mental state and behaviour.

I would not accept this characterization. Antirealism is the view that there are no stance-independent moral facts.

 

I don't understand the difference, which is kind of the problem I identified in the first place. It's difficult to reject the existence of a phenomenon you haven't defined. (the concept of ignosticism applies here). 'Moral facts' sound to me something like 'the truth values behind normative statements' (though that has further definitional problems relating to both 'truth values' - cf my other most recent reply - and 'normative statements')

If you reject that definition, it might be more helpful to define moral facts by exclusion from seemingly better understood phenomena. For example, I think more practical definitions might be:

  • Nonphysical phenomena
  • Nonphysical and nonexperiential phenomena

Obviously this has the awkwardness of including some paranormal phenomena, but I don't think that's a huge cost. Many paranormal phenomena obviously would be physical, were they to exist (as in, they can exert force, have mass etc), and you and I can probably agree we're not that interested in the particular nonexistences of most of the rest.

 I have all kinds of preferences that are totally unrelated to my own experiences

I wrote a long essay about the parameters of 'preference' in the context of preference utilitarianism here, which I think equally applies to supposedly nonmoral uses of the word (IIRC I might have shown it to you before?). The potted version is that people frequently use the word in a very motte-and-bailey-esque fashion, sometimes invoking quasi magical properties of preferences, other times treating them as an unremarkable part of the physical or phenomenal world. I think that's happening here:

>  {cultural relativism ... daughter} examples

There's a relatively simple experientialist account of these, which goes 'people pursue their daughter's/culture's wellbeing because it gives them some form of positive valence to do so'. This is the view which I accuse of being a conflict doctrine (unless it's paired with some kind of principled pursuit of such positive valence elsewhere).

You seem to be saying your view is not this: 'I’d do it because I value more than just my own experiences'.

If this is true, then I think many of my criticisms don't apply to you - but I also think this is a very selective notion of antirealism. Specifically, it requires a notion of 'to value', which you're saying is *not* exclusively experiential (and presumably isn't otherwise entirely physical too - unless you say its nonexperiential components are just a revealed preference in your behaviour?).

Perhaps you just mean a more expansive notion of experiential value that the word 'happiness' implies. I use the latter to mean 'any positively valenced experience', fwiw - I don't think the colloquial distinction is philosophically interesting. But that puts you back in the 'doctrine of conflict' camp, if you aren't able to guide someone, through dispassioned argument, to value your daughter/culture the way you do if they don't already.

For the record, I am not claiming that a large majority of persuasion falls into the 6th/7th groups. I think it's a tiny minority of it in fact - substiantially less than the amount which is e.g. demonstrating how to think logically or understand statistics, or persuading someone to change their mind with logic or statistical data, both of which are already miniscule.

But the difference between antirealism and exclusivism/realism is that antirealism excludes the possibility of such interactions entirely.

When performing an action, my goal is to achieve the desired outcome. I don’t have to experience the outcome to be motivated to perform the action.

But you have no access to whether the outcome is achieved, only to your phenomenal experience of changing belief that it will/won't be or has/hasn't been. So if you don't recognise the valence of that process of changing belief as the driver of your motivation and instead assert that some nonphysical link between your behaviour and the outcome is driving you, then under the exclusionary definition of moral facts you appear to be invoking one.

 Can you elaborate on these?

  • The uniquely non-evolutionarily explicability of utilitarianism (h.t. Joshua Greene's argument in Moral Tribes) and how antirealists can explain this
  • The convergence of moral philosophers towards three heavily overlapping moral philosophies - given the infinite possible moral philosophies and how antirealists can explain this

What is it antirealists are supposed to explain, specifically?

When we see a predictable pattern in the world, we generally understand it to be the result of some underlying law or laws, such that if you knew everything about the universe you could in principle predict the pattern before seeing it. 

It seems basically impossible to explain the convergence towards the philosophies above by any law currently found in physical science. Evolutionary processes might drive people to protect their kin, deter aggressors etc, but there's no need for any particular cognitive or emotional oattachment to the 'rightness' of this (there's no obvious need for any emotional state at all, really, but even given that we have them they might have been entirely supervenient on behaviour, or universally tended towards cold pragmatism or whatever). And evolutionary process have no ability to explain a universally impartial philosophy like utiltiarianism, which is actively deleterious to its proponents' survival and reproductive prospects.

So what are the underlying laws by which one could have predicted the convergence of moral philosophies, rather than just virtue signalling and similar behaviours, in particular to a set including utilitarianism?

I would characterise antirealism as something like 'believing that there is no normative space, and hence no logical or empirical line of reasoning you could give to change someone's motivations.'

I don't think anti-realists would accept that they aren't possible on their view.

I would be interested to hear a counterexample that isn't a language game. I don't see how one can sincerely advocate someone else hold a position they think is logically indefensible.

Anti-realists don't think that people don't have dispositions that are well described as moral. It's possible to share dispositions, and in fact we all empirically do share a lot of moral dispositions. 

I think this is a language game. A 'disposition' is not the same phenomenon that someone who believes their morality has some logical/empirical basis thinks their morality is. A disposition isn't functionally distinct from a preference - something we can arguably share, but per Hume, something which has nothing to do do with reason. 

Someone who believed in a moral realist view that valued a state whose realisation they would never experience - black ties at their own funeral, for instance - should be highly sceptical of a moral antirealist who claimed to value the same state even though they also wouldn't experience it. The realist believes the word 'value' in that sentence means something motivationally relevant to a moral realist. To an antirealist it can only mean something like 'pleasing to imagine'. But if they won't be at the funeral, they won't know whether the state was realised, and so they can get their pleasure just imagining it happen - it doesn't otherwise matter to them whether it does.

Not by coincidence I think, this arguably gives the antirealist access to a basically hedonistic quasi-morality in practice (though no recourse to defend it), but not to any common alternative.

I don't see how persuading is easier for a moral realist, surely you would still need to appeal to something that your interlocutor already believes/ values. 

If you start with the common belief that there is such some 'objective' morality and some set of steps of reasoning tools that would let us access it, you can potentially correct the other's use of those tools in good faith. If one of you doesn't actually believe that process is even possible, it would be disingenuous to suppose there's something even to correct.

***

FWIW, we're spilling a lot of ink over by far the least interesting part of my initial comment. I would expect it to be more productive to talk about e.g.:

  • The analogy of (the irrelevance of) moral realism to (the irrelevance of) mathematical realism/Platonism
  • The uniquely non-evolutionarily explicability of utilitarianism (h.t. Joshua Greene's argument in Moral Tribes) and how antirealists can explain this
  • The convergence of moral philosophers towards three heavily overlapping moral philosophies - given the infinite possible moral philosophies and how antirealists can explain this
  • My suggestion that this strongly suggests a process by which some or most moral philosophies can be excluded: does this seem false? Or true, but insufficiently powered to narrow the picture down?
  • My suggestion that iterative self-modification of one's motivations might converge: whether people disagree with this suggestion or agree but think the phenomenon is explicable in e.g. strictly physical terms or otherwise uninteresting
  • My suggestion that if we accept that motivation has its own set of axiom-like properties, we might be able to 'derive' quasi-moral views in the same way we can derive properties about applied maths or physics (i.e. not that they're necessarily 'true' whatever that means, but that we will necessarily behave in ways that in some sense assume them to be)

Hey Lance,

To be clear, I'm talking about when an antirealist wants a behaviour change from another person (that, by definition, that person isn't currently inclined to do). Say you wanted to persuade me to vote for a particular political candidate. If you were a moral realist, you'd have these classes of option:

  1. Present me with real data that shows that on my current views, it would benefit me to vote for them
  2. Present me with false or cherrypicked data that shows that on my current views, it would benefit me to vote for them
  3. Threaten me if I don't vote for them
  4. Emotionally cajole me into voting for them, e.g. by telling me they saved my cat, that their opponent is a lecher, etc - in some way highlighting some trait that will irrationally dispose me towards them
  5. Feign belief in moral view that I hold and show me that their policies are more aligned with it
  6. Show me that their policies are more aligned with a moral view that we both in fact share
  7. Persuade me to accept whatever (you think) is the 'correct' moral view, and show me that their policies are aligned with it

Perhaps others, and perhaps the 2-5 are basically the same thing, but whatever. As a moral antirealist you don't have access to the last two. And without those, the only honest/nonviolent option you have to persuade me is not going to be available to you the majority of the time, since usually I'm going to be better informed than you about what things are in fact good for me.

This isn't to say that moral antirealists necessarily will manipulate/threaten etc - I know many antirealists who seem like 'good' people who would find manipulating other people for personal gain grossly unpleasant. But nonetheless, taking away the last two options without replacing them with something equally honest necessarily incentivises the remaining set, most of which and the most accessible of which are dishonest.

This isn't supposed to be a substantial argument for moral realism, but I think it's an argument against antirealism. As an antirealist it would nonetheless be far better for you to live in a world where the 6th and 7th options were possible. So if you reject moral realism, you prudentially should nonetheless favour finding a third option, that permits similarly nonmanipulative options.

(Though, sidebar: while it's easy to dismiss the desirability of this property as a distraction from the 'truth' of the debate, I think this is too simplistic. At the level of abstraction at which moral philosophy happens, 'truth' is also a somewhat murky notion, and one we don't have access to. We can say we have beliefs, but even those are a form of action, and hence motivated. So it's unclear to me what lies at the bottom of this pyramid, but I don't think the view that morality/motivation is a form of knowledge and thus undergirded by epistemology makes any sense)

(Deleted my lazy comment to give more colour)

Neither agree nor disagree - I think the question is malformed, and both 'sides' have extremely undesirable properties. Moral realism's failings are well documented in the discussion here, and well parodied as being 'spooky' or just wishful thinking. But moral antirealism is ultimately a doctrine of conflict - if reason has no place in motivational discussion, then all that's left for me to get my way from you is threats, emotional manipulation, misinformation and, if need be, actual violence. Any antirealist who denies this as the implication of their position is kidding themselves (or deliberately supplying misinformation). 

So I advocate for a third position.

I think the central problem with this debate is that the word 'objective' here has no coherent referent (except when people use it for silly examples, like referring to instructions etched into the universe somewhere). And a noncoherent referent can neither be coherently asserted nor denied. 

To paraphrase Douglas Adams, if we don't know what the question is, we can't hope to find an understandable answer.

I think it's useful to compare moral philosophy to applied maths or physics, in that while there are still open debates about whether mathematical Platonism (approximately, objectivity in maths) is correct, most people think it isn't (or, rather, that it's incoherently defined) - and yet most people still think well-reasoned maths is essential to our interactions with the world. Perhaps the same could be true of morality.

One counterpoint might be that unlike maths, morality is dispensable - you can seemingly do pretty well in life by acting as though it doesn't exist (arguably better). But I think this is true only if you focus exclusively on the limited domain of morality that deals with 'spooky' properties and incoherent referents.

A much more fruitful approach to the discussion, IMO, is to start by looking at the much broader question of motivation, aka the cause of Agent A taking some action A1. Motivation has various salient properties:

  • Almost everyone agrees that it has a referent (eliminative materialists may disagree, but they're a tiny minority - and perhaps are literally without mental state and therefore don't have the information to understand the referent)
  • But that referent is still mysterious - we don't have a clear notion of either 'causation' or 'agents' - so there's plenty of room for empirical and conceptual discussions of its nature
  • Whatever it is is is everpresent in our decision-making process, possibly by definition
  • Motivated agents can seemingly be changed by receiving information, in a more profound way than any other objects (excepting perhaps software, though the more software can be changed in comparable ways the more it starts to look like a motivated agent - and it's still a long way from the breadth of interaction with input animals have)
  • Crucially, many of us would change our motivation in somewhat predictable ways if we had the ability to rewrite our source code

For example, many of us might choose to modify our motivations so that we e.g.:

  • Procrastinated less
  • Put more effort into the aesthetics of our surroundings
  • Were more patient in thinking things through (sort of a motivational aspect to intelligence)
  • In general, helped our future selves more
  • Felt generally happier
  • Perhaps put more effort into helping out our friends and family
  • Perhaps put more effort into helping strangers
  • etc

I would argue that some - but not all - of these modifications would be close to or actually universal. I would also argue that some of those that weren't universal for early self-modifications might still be states that iterated self-moderators would gravitate towards. 

For example, becoming more 'intelligent' through patient thought might cause us to focus a) more on happiness itself than instrumental pathways to happiness like interior design, and b) to recognise the lack of a fundamental distinction between our 'future self' and 'other people', and so tend more towards willingness to help out the latter.

At this point I'm in danger of aligning hedonistic/valence utilitarianism to this process, but you don't have to agree with the previous paragraph to accept that some motivations would be more universal, or at least greater 'attractors' than others while disagreeing on the particulars. 

However it's not a coincidence that thinking about 'morality' like this leads us towards some views more than others. Part of the appeal of this way of thinking is that it offers the prospect of 'correct' answers to moral philosophy, or at least shows that some are incorrect - in a comparable sense to the (in)correctness we find in maths.

So we can think of this process as revealing something analogous to 'consistency' in maths. It's not (or not obviously) the same concept, since it's hard to say there's something formally 'inconsistent' in e.g. wanting to procrastinate more, or to be unhappier. Yet wanting such things is contrary in nature to something that for most or all of us resembles an 'axiom' - the drive to e.g. avoid extreme pain and generally to make our lives go better.

If we can identify this or these 'motivational axiom(s)', or even just find a reasonable working definition of them, this means we are in a similar position as we are in applied maths: without ever showing that something is 'objectively wrong' - whatever that could mean - we can show that some conclusions are so contrary to our nature - our 'nature' being 'the axioms we cannot avoid accepting as we function as conscious, decision-making, motivated beings' that we can exclude them from serious consideration. 

This raises the question of which and how many moral conclusions are left when we've excluded all those ruled out by our axioms. I suspect and hope that the answer is 'one' (you might guess approximately which from the rest of this message), but that's a much more ambitious argument than I want to make here. Here I just want to claim that this is a better way of thinking about metaethical questions than the alternatives. 

I've had to rush through this comment without clearly distinguishing theses, but I'm making 2.5 core claims here:

  1. One can in principle imagine a way of 'doing moral philosophy' that excludes some set of conceivable moralities - and we needn't use any imagined 'object' as a referent to do
  2. That a promising way of doing so is to imagine what we might gravitate towards if we were to iteratively self-modify our motivations
  3. That a distinct but related promising way of doing so is to recognise quasi- or actually-universal motivational 'axioms', and what they necessitate or rule out about our behaviour if consistently accounted for

I don't know if these positions already exists in moral philosophy - I'd be very surprised if I'm the first to advocate them, but fwiw I didn't find anything matching them when I looked a few years ago (though my search was hardly exhaustive). For want of distinguishing it from the undesirable properties of both traditional sets of views and with reference to the previous paragraph, I refer to it as 'moral exclusivism'. 

Obviously you could define exclusivism into being either antirealism or realism, but IMO that's missing its ability to capture the intuition behind both without necessitating the baggage of either.

I think that's right, but modern AI benchmarks seem to have much the same issue. A human with a modern Claude instance might be able to write code 100x faster than without, but probably less than 2x as fast at choosing a birthday present for a friend.

Ideally you want to integrate over... something to do with the set of all tasks. But it's hard to say what that something would be, let alone how you're going to meaningfully integrate it.

To make outcome-based decisions, you have to decide on the period in which you're considering them. Considering any given period costs non-0 resources (reductio ad absurdum: in practice, considering all possible future timelines would cost infinite resources, so we presumably agree on the principle that excluding some from consideration is not only reasonable but necessary).

I think it's a reasonable position to believe that if something can't be empirically validated then it at least needs exceptionally strong conceptual justifications to inform such decisions.

This cuts both ways, so if the argument of AI2027 is 'we shouldn't dismiss this outcome out of hand' then it's a reasonable position (although I find Titotal's longer backcasting an interesting counterweight, and it prompted me to wonder about a good way to backcast still further). If the argument is that AI safety researchers should meaningfully update towards shorter timelines based on the original essay or that we should move a high proportion of the global or altruistic economy towards event planning for AGI in 2027 - which seems to be what the authors are de facto pushing for - that seems much less defensible. 

And I worry that they'll be fodder for views like Aschenbrenner's, and used to justify further undermining US-China relations and increasing the risk of great power conflict or nuclear war, both of which seems to me like more probable events in the next decade than AGI takeover.

Suggested hiring practice tweak

There are typically two ways for organisations of running hiring rounds: deadlined, in which job applications are no longer processed after a publicised date, and rolling in which the organisation will keep allowing submissions until they've found someone they want.

The upside of a deadline is both to an applicant that they know they're not wasting their time on a job that's 99% assigned, and to the organisation, which doesn't have to delay giving an answer to an adequate candidate on the grounds that a potentially better one submits when you're most of the way through the hiring process, and incentivises people to apply slightly earlier than they would have.

The downsides are basically the complement. The individual doesn't get to go for a job that they've just missed and would be really suited to, and the org doesn't get to see as large a pool of applicants.

It occurred to me that an org might be able to get some of the best of both by explicitly giving a mostly-deadline, after which they will explicitly downweight new applications. So if you see the mostly-deadline in time, you're still incentivised to get your application in by the date given, and if it's passed you should rationally apply if and only if you think there's a good chance you're an exceptional fit..

One of the problems with AI benchmarks is that they can't effectively be backcast more than a couple of years. This prompted me to wonder if a more empirical benchmark might be something like 'Ability of a human in conjunction with the best technology available at time t'.

For now at least, humans are still necessary to have in the loop, so this should in principle be at least as good as coding benchmarks for gauging where we are now. When/if humans become irrelevant, it should still work - 'AI capability + basically nothing' = 'AI capability'. And looking back, it gives a much bigger reference class for forecasting future trends, allowing us to compare e.g.

  • Human
  • Human + paper & pen
  • Human + log tables + paper & pen
  • Human + calculator + log tables + paper & pen
  • Human + computer with C + ...
  • Human + computer with Python + ...
  • Human + ML libraries + ...
  • Human + GPT 1 + ...

etc.

Thoughts?

Load more