Something you hear a lot in discussions is that it's important not to strawman arguments: to assume they are much weaker than they are. This is uncharitable.

Instead, the suggestion is you should steelman arguments: consider the strongest version of it, even if that's not what the person has said, and then evaluate that. Steelmanning is thought of as getting the principle of charity just right.

However, I suspect there should be a third category, call it mithrilman: where arguments are treated as much stronger than they are, and you accept them even though you don't understand the reasons. For the non-nerds out there, mithril is Tolkien's fictional super-strong metal in Lord of the Rings. 

Whilst strawmanning is being too uncharitable, mithrilmanning is being too charitable. You don't want to do either. Goldilocks lies inbetwixt the two. 

I see mithrilmanning quite a lot among effective altruists. Usually, it goes something like this. People are discussing a view or argument they've heard person X make. The individuals are sitting around, brows furrowed, and struggling to find a good steelman of the argument: they can't work out what plausible reasons that person could have for their conclusion. After a while, even though they can't find a suitable steelman, someone says "Well, X does seem really smart, so...". Everyone nods. The conversation moves on.

What's happened is that someone has suggested the group should defer, even though they can't follow the reasoning or provide it themselves. This seems to happen much more often when person X is important (not least because you don't want to risk looking stupid). 

I think there can be good cases where one should defer, but I'm worried I see too much of this. We should give people the benefit of the doubt - assume they are smart, thoughtful, etc. rather than fools - but we should still doubt. To err is human. We all make mistakes. We make progress by pointing those out. 

So, if you think someone is really smart, but you can't make sense of what they are thinking, at least hesitate on deferring to them. If possible, ask them to explain. It seems too charitable to assume they are right, not charitable enough to assume they are wrong. In assuming that they can give you a sensible answer, you are treating them with appropriate charity. 

I don't think I need to say why strawmanning is bad. The danger of mithrilmanning is you end up with too much deference, an information cascade and ultimately false beliefs. People end up believing what X says, even though no one really understands why. 

So, if you find yourself overhearing, or even saying yourself, "well, they do seem really smart..." consider adding "um, are we mithrilmanning this? We don't want to defer uncritically."

89

0
0

Reactions

0
0
Comments11


Sorted by Click to highlight new comments since:

A lot of the work with mithrilmen is keeping an argument at a level of abstraction where it sounds sensible as a principle, but yet declining to interrogate it further, perhaps because venerated people hold that position.

From personal experience, I can tell you that really really smart people are wrong all the time. They're much more likely to be wrong when talking outside of their domain of expertise, but even a physics professor talking about physics will inevitably get stuff wrong in regular conversation. 

If someone says something that doesn't make sense, you should of course try and understand their argument and see if you're missing something. But "this person made a mistake" should always be a hypothesis under consideration, and it's often the most likely explanation. 

Michael - good points. 

It sounds like proper steelmanning is mostly applied to arguments, evidence, values, and reasons, whereas mithrilmanning is often applied more to specific influential individuals who tend to be associated with certain positions. (e.g. we might think 'Yann LeCun's machine learning research has been cited 600,000 times, so he must have some valid points when he expresses the view that we shouldn't worry about AI extinction risk - even though he sounds irrational and deranged on this topic.) The mithril armor is really being wrapped around some prestigious person who's making apparently weak arguments, more than around the apparently weak arguments themselves.

My suggestion for overcoming mithrilmanning is to find a less prestigious, but still reputable person, who makes the same arguments, and interrogate the validity of those arguments as if they're from a less influential source. (e.g. if some less-famous AI researchers makes basically the same arguments as LeCun, then dissect that less-famous person's arguments, rather than trying to face down LeCun, as if he's some Final Boss in a scary video game.) This is basically a social psychology hack to make us less intimidated by some famous person's reputation, so we can engage with the quality of their arguments, without getting misled by our  instincts to submit and defer to high-status individuals.

Yes, reflecting on this since posting, I have been wondering if there is some important distinction between the principle of charity applied to arguments in the abstract vs its application to the (understated) reasoning of individuals in some particular instance. Steelmanning seems good in the former case, because you're aiming to work your way to the truth. But steelmanning goes to far, and become mithrilmanning, in the latter case when you start assuming the individuals must have good reasons, even though you don't know what they are.

Perhaps mithrilmanning involves an implicit argument from authority ("this person is an authority. Therefore they must be right. Why might they be right?").

The problem with strawmanning and steelmanning isn't a matter of degree, and I don't think goldilocks can be found in that dimension at all. If you find yourself asking "how charitable should I be in my interpretation?" I think you've already made a mistake.

Instead, I'd like to propose a fourth category. Let's call it.. uhh.. the "blindman"! ^^

The blindman interpretation is to forget you're talking to a person, stop caring about whether they're correct, and just try your best to extract anything usefwl from what they're saying.[1] If your inner monologue goes "I agree/disagree with that for reasons XYZ," that mindset is great for debating or if you're trying to teach, but it's a distraction if you're purely aiming to learn. If I say "1+1=3" right now, it has no effect wrt what you learn from the rest of this comment, so do your best to forget I said it.

For example, when I skimmed the post "agentic mess", I learned something I thought was exceptionally important, even though I didn't actually read enough to understand what they believe. It was the framing of the question that got me thinking in ways I hadn't before, so I gave them a strong upvote because that's my policy for posts that cause me to learn something I deem important--however that learning comes about.

Likewise, when I scrolled through a different post, I found a single sentence[2] that made me realise something I thought was profound. I actually disagree with the main thesis of the post, but my policy is insensitive to such trivial matters, so I gave it a strong upvote. I don't really care what they think or what I agree with, what I care about is learning something.

  1. ^

    "What they believe is tangential to how the patterns behave in your own models, and all that matters is finding patterns that work."

    From a comment on reading to understand vs reading to defer/argue/teach.

  2. ^

    "The Waluigi Effect: After you train an LLM to satisfy a desirable property , then it's easier to elicit the chatbot into satisfying the exact opposite of property ."

You might enjoy the book 'Thanks for the Feedback', which basically emphasises this point a lot.

Thanks I really like this, and would appreciate some examples so I can get my head around this. It might be hard without being uncharitable, but I struggle to think of concrete examples at thte moment.

I guess any of the following might be examples (emphasis on might):

  • it seems bad to buy expensive historic buildings, which don't seem fit-for-purpose for the proposed use case and have really high running costs - but the people involved are really smart, so...

  • it seems bad to fly people to the Bahamas to do coworking and collaboration, and like this is being driven by a billionaire's desire for company and personal convenience. It seems like this wouldn't be the method you would choose if you were starting from a point of maximising impact and cost-effectiveness - but the people seem really smart

  • it seems bad that the largest recipients of funding from the FTX Future Fund are organisations where one of the FTX grantmakers sits on their Board, but...

  • it seems very very very bad to say you would take the bet every time, if someone told you that there was a 51% chance that you'd double the universe and a 49% chance that you'd destroy it, but...

I'm not sure if people did defer to these arguments because of the people making them rather than a sincere belief that they are good, but it seems at least possible (especially the last one).

Fantastic examples, I understand it better now

And 100% agree with you that I  assessed all of those examples above and was bewildered that so many people seemed to defend them, often based on the fact that "smart and good people" had made the decision

Nice one

Or, senior AI researcher says that AI poses no risk because it's years away. This doesn't really make sense - what will happen in a few years? But he does seem smart and work for a prestigious tech company, so...

Thanks for writing this! Seems useful to have a term for excessive charitability. Being able to point at it succinctly might help mitigate information cascades.

Curated and popular this week
Paul Present
 ·  · 28m read
 · 
Note: I am not a malaria expert. This is my best-faith attempt at answering a question that was bothering me, but this field is a large and complex field, and I’ve almost certainly misunderstood something somewhere along the way. Summary While the world made incredible progress in reducing malaria cases from 2000 to 2015, the past 10 years have seen malaria cases stop declining and start rising. I investigated potential reasons behind this increase through reading the existing literature and looking at publicly available data, and I identified three key factors explaining the rise: 1. Population Growth: Africa's population has increased by approximately 75% since 2000. This alone explains most of the increase in absolute case numbers, while cases per capita have remained relatively flat since 2015. 2. Stagnant Funding: After rapid growth starting in 2000, funding for malaria prevention plateaued around 2010. 3. Insecticide Resistance: Mosquitoes have become increasingly resistant to the insecticides used in bednets over the past 20 years. This has made older models of bednets less effective, although they still have some effect. Newer models of bednets developed in response to insecticide resistance are more effective but still not widely deployed.  I very crudely estimate that without any of these factors, there would be 55% fewer malaria cases in the world than what we see today. I think all three of these factors are roughly equally important in explaining the difference.  Alternative explanations like removal of PFAS, climate change, or invasive mosquito species don't appear to be major contributors.  Overall this investigation made me more convinced that bednets are an effective global health intervention.  Introduction In 2015, malaria rates were down, and EAs were celebrating. Giving What We Can posted this incredible gif showing the decrease in malaria cases across Africa since 2000: Giving What We Can said that > The reduction in malaria has be
Neel Nanda
 ·  · 1m read
 · 
TL;DR Having a good research track record is some evidence of good big-picture takes, but it's weak evidence. Strategic thinking is hard, and requires different skills. But people often conflate these skills, leading to excessive deference to researchers in the field, without evidence that that person is good at strategic thinking specifically. I certainly try to have good strategic takes, but it's hard, and you shouldn't assume I succeed! Introduction I often find myself giving talks or Q&As about mechanistic interpretability research. But inevitably, I'll get questions about the big picture: "What's the theory of change for interpretability?", "Is this really going to help with alignment?", "Does any of this matter if we can’t ensure all labs take alignment seriously?". And I think people take my answers to these way too seriously. These are great questions, and I'm happy to try answering them. But I've noticed a bit of a pathology: people seem to assume that because I'm (hopefully!) good at the research, I'm automatically well-qualified to answer these broader strategic questions. I think this is a mistake, a form of undue deference that is both incorrect and unhelpful. I certainly try to have good strategic takes, and I think this makes me better at my job, but this is far from sufficient. Being good at research and being good at high level strategic thinking are just fairly different skillsets! But isn’t someone being good at research strong evidence they’re also good at strategic thinking? I personally think it’s moderate evidence, but far from sufficient. One key factor is that a very hard part of strategic thinking is the lack of feedback. Your reasoning about confusing long-term factors need to extrapolate from past trends and make analogies from things you do understand better, and it can be quite hard to tell if what you're saying is complete bullshit or not. In an empirical science like mechanistic interpretability, however, you can get a lot more fe
Ronen Bar
 ·  · 10m read
 · 
"Part one of our challenge is to solve the technical alignment problem, and that’s what everybody focuses on, but part two is: to whose values do you align the system once you’re capable of doing that, and that may turn out to be an even harder problem", Sam Altman, OpenAI CEO (Link).  In this post, I argue that: 1. "To whose values do you align the system" is a critically neglected space I termed “Moral Alignment.” Only a few organizations work for non-humans in this field, with a total budget of 4-5 million USD (not accounting for academic work). The scale of this space couldn’t be any bigger - the intersection between the most revolutionary technology ever and all sentient beings. While tractability remains uncertain, there is some promising positive evidence (See “The Tractability Open Question” section). 2. Given the first point, our movement must attract more resources, talent, and funding to address it. The goal is to value align AI with caring about all sentient beings: humans, animals, and potential future digital minds. In other words, I argue we should invest much more in promoting a sentient-centric AI. The problem What is Moral Alignment? AI alignment focuses on ensuring AI systems act according to human intentions, emphasizing controllability and corrigibility (adaptability to changing human preferences). However, traditional alignment often ignores the ethical implications for all sentient beings. Moral Alignment, as part of the broader AI alignment and AI safety spaces, is a field focused on the values we aim to instill in AI. I argue that our goal should be to ensure AI is a positive force for all sentient beings. Currently, as far as I know, no overarching organization, terms, or community unifies Moral Alignment (MA) as a field with a clear umbrella identity. While specific groups focus individually on animals, humans, or digital minds, such as AI for Animals, which does excellent community-building work around AI and animal welfare while