Something you hear a lot in discussions is that it's important not to strawman arguments: to assume they are much weaker than they are. This is uncharitable.

Instead, the suggestion is you should steelman arguments: consider the strongest version of it, even if that's not what the person has said, and then evaluate that. Steelmanning is thought of as getting the principle of charity just right.

However, I suspect there should be a third category, call it mithrilman: where arguments are treated as much stronger than they are, and you accept them even though you don't understand the reasons. For the non-nerds out there, mithril is Tolkien's fictional super-strong metal in Lord of the Rings. 

Whilst strawmanning is being too uncharitable, mithrilmanning is being too charitable. You don't want to do either. Goldilocks lies inbetwixt the two. 

I see mithrilmanning quite a lot among effective altruists. Usually, it goes something like this. People are discussing a view or argument they've heard person X make. The individuals are sitting around, brows furrowed, and struggling to find a good steelman of the argument: they can't work out what plausible reasons that person could have for their conclusion. After a while, even though they can't find a suitable steelman, someone says "Well, X does seem really smart, so...". Everyone nods. The conversation moves on.

What's happened is that someone has suggested the group should defer, even though they can't follow the reasoning or provide it themselves. This seems to happen much more often when person X is important (not least because you don't want to risk looking stupid). 

I think there can be good cases where one should defer, but I'm worried I see too much of this. We should give people the benefit of the doubt - assume they are smart, thoughtful, etc. rather than fools - but we should still doubt. To err is human. We all make mistakes. We make progress by pointing those out. 

So, if you think someone is really smart, but you can't make sense of what they are thinking, at least hesitate on deferring to them. If possible, ask them to explain. It seems too charitable to assume they are right, not charitable enough to assume they are wrong. In assuming that they can give you a sensible answer, you are treating them with appropriate charity. 

I don't think I need to say why strawmanning is bad. The danger of mithrilmanning is you end up with too much deference, an information cascade and ultimately false beliefs. People end up believing what X says, even though no one really understands why. 

So, if you find yourself overhearing, or even saying yourself, "well, they do seem really smart..." consider adding "um, are we mithrilmanning this? We don't want to defer uncritically."

89

0
0

Reactions

0
0
Comments11


Sorted by Click to highlight new comments since:

A lot of the work with mithrilmen is keeping an argument at a level of abstraction where it sounds sensible as a principle, but yet declining to interrogate it further, perhaps because venerated people hold that position.

From personal experience, I can tell you that really really smart people are wrong all the time. They're much more likely to be wrong when talking outside of their domain of expertise, but even a physics professor talking about physics will inevitably get stuff wrong in regular conversation. 

If someone says something that doesn't make sense, you should of course try and understand their argument and see if you're missing something. But "this person made a mistake" should always be a hypothesis under consideration, and it's often the most likely explanation. 

Michael - good points. 

It sounds like proper steelmanning is mostly applied to arguments, evidence, values, and reasons, whereas mithrilmanning is often applied more to specific influential individuals who tend to be associated with certain positions. (e.g. we might think 'Yann LeCun's machine learning research has been cited 600,000 times, so he must have some valid points when he expresses the view that we shouldn't worry about AI extinction risk - even though he sounds irrational and deranged on this topic.) The mithril armor is really being wrapped around some prestigious person who's making apparently weak arguments, more than around the apparently weak arguments themselves.

My suggestion for overcoming mithrilmanning is to find a less prestigious, but still reputable person, who makes the same arguments, and interrogate the validity of those arguments as if they're from a less influential source. (e.g. if some less-famous AI researchers makes basically the same arguments as LeCun, then dissect that less-famous person's arguments, rather than trying to face down LeCun, as if he's some Final Boss in a scary video game.) This is basically a social psychology hack to make us less intimidated by some famous person's reputation, so we can engage with the quality of their arguments, without getting misled by our  instincts to submit and defer to high-status individuals.

Yes, reflecting on this since posting, I have been wondering if there is some important distinction between the principle of charity applied to arguments in the abstract vs its application to the (understated) reasoning of individuals in some particular instance. Steelmanning seems good in the former case, because you're aiming to work your way to the truth. But steelmanning goes to far, and become mithrilmanning, in the latter case when you start assuming the individuals must have good reasons, even though you don't know what they are.

Perhaps mithrilmanning involves an implicit argument from authority ("this person is an authority. Therefore they must be right. Why might they be right?").

The problem with strawmanning and steelmanning isn't a matter of degree, and I don't think goldilocks can be found in that dimension at all. If you find yourself asking "how charitable should I be in my interpretation?" I think you've already made a mistake.

Instead, I'd like to propose a fourth category. Let's call it.. uhh.. the "blindman"! ^^

The blindman interpretation is to forget you're talking to a person, stop caring about whether they're correct, and just try your best to extract anything usefwl from what they're saying.[1] If your inner monologue goes "I agree/disagree with that for reasons XYZ," that mindset is great for debating or if you're trying to teach, but it's a distraction if you're purely aiming to learn. If I say "1+1=3" right now, it has no effect wrt what you learn from the rest of this comment, so do your best to forget I said it.

For example, when I skimmed the post "agentic mess", I learned something I thought was exceptionally important, even though I didn't actually read enough to understand what they believe. It was the framing of the question that got me thinking in ways I hadn't before, so I gave them a strong upvote because that's my policy for posts that cause me to learn something I deem important--however that learning comes about.

Likewise, when I scrolled through a different post, I found a single sentence[2] that made me realise something I thought was profound. I actually disagree with the main thesis of the post, but my policy is insensitive to such trivial matters, so I gave it a strong upvote. I don't really care what they think or what I agree with, what I care about is learning something.

  1. ^

    "What they believe is tangential to how the patterns behave in your own models, and all that matters is finding patterns that work."

    From a comment on reading to understand vs reading to defer/argue/teach.

  2. ^

    "The Waluigi Effect: After you train an LLM to satisfy a desirable property , then it's easier to elicit the chatbot into satisfying the exact opposite of property ."

You might enjoy the book 'Thanks for the Feedback', which basically emphasises this point a lot.

Thanks I really like this, and would appreciate some examples so I can get my head around this. It might be hard without being uncharitable, but I struggle to think of concrete examples at thte moment.

I guess any of the following might be examples (emphasis on might):

  • it seems bad to buy expensive historic buildings, which don't seem fit-for-purpose for the proposed use case and have really high running costs - but the people involved are really smart, so...

  • it seems bad to fly people to the Bahamas to do coworking and collaboration, and like this is being driven by a billionaire's desire for company and personal convenience. It seems like this wouldn't be the method you would choose if you were starting from a point of maximising impact and cost-effectiveness - but the people seem really smart

  • it seems bad that the largest recipients of funding from the FTX Future Fund are organisations where one of the FTX grantmakers sits on their Board, but...

  • it seems very very very bad to say you would take the bet every time, if someone told you that there was a 51% chance that you'd double the universe and a 49% chance that you'd destroy it, but...

I'm not sure if people did defer to these arguments because of the people making them rather than a sincere belief that they are good, but it seems at least possible (especially the last one).

Fantastic examples, I understand it better now

And 100% agree with you that I  assessed all of those examples above and was bewildered that so many people seemed to defend them, often based on the fact that "smart and good people" had made the decision

Nice one

Or, senior AI researcher says that AI poses no risk because it's years away. This doesn't really make sense - what will happen in a few years? But he does seem smart and work for a prestigious tech company, so...

Thanks for writing this! Seems useful to have a term for excessive charitability. Being able to point at it succinctly might help mitigate information cascades.

Curated and popular this week
Ben_West🔸
 ·  · 1m read
 · 
> Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. > > The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts. > > Full paper | Github repo Blogpost; tweet thread. 
Joris 🔸
 ·  · 5m read
 · 
Last week, I participated in Animal Advocacy Careers’ Impactful Policy Careers programme. Below I’m sharing some reflections on what was a really interesting week in Brussels! Please note I spent just one week there, so take it all with a grain of (CAP-subsidized) salt. Posts like this and this one are probably much more informative (and assume less context). I mainly wrote this to reflect on my time in Brussels (and I capped it at 2 hours, so it’s not a super polished draft). I’ll focus mostly on EU careers generally, less on (EU) animal welfare-related careers. Before I jump in, just a quick note about how I think AAC did something really cool here: they identified a relatively underexplored area where it’s relatively easy for animal advocates to find impactful roles, and then designed a programme to help these people better understand that area, meet stakeholders, and learn how to find roles. I also think the participants developed meaningful bonds, which could prove valuable over time. Thank you to the AAC team for hosting this! On EU careers generally * The EU has a surprisingly big influence over its citizens and the wider world for how neglected it came across to me. There’s many areas where countries have basically given a bunch (if not all) of their decision making power to the EU. And despite that, the EU policy making / politics bubble comes across as relatively neglected, with relatively little media coverage and a relatively small bureaucracy. * There’s quite a lot of pathways into the Brussels bubble, but all have different ToCs, demand different skill sets, and prefer different backgrounds. Dissecting these is hard, and time-intensive * For context, I have always been interested in “a career in policy/politics” – I now realize that’s kind of ridiculously broad. I’m happy to have gained some clarity on the differences between roles in Parliament, work at the Commission, the Council, lobbying, consultancy work, and think tanks. * The absorbe
Max Taylor
 ·  · 9m read
 · 
Many thanks to Constance Li, Rachel Mason, Ronen Bar, Sam Tucker-Davis, and Yip Fai Tse for providing valuable feedback. This post does not necessarily reflect the views of my employer. Artificial General Intelligence (basically, ‘AI that is as good as, or better than, humans at most intellectual tasks’) seems increasingly likely to be developed in the next 5-10 years. As others have written, this has major implications for EA priorities, including animal advocacy, but it’s hard to know how this should shape our strategy. This post sets out a few starting points and I’m really interested in hearing others’ ideas, even if they’re very uncertain and half-baked. Is AGI coming in the next 5-10 years? This is very well covered elsewhere but basically it looks increasingly likely, e.g.: * The Metaculus and Manifold forecasting platforms predict we’ll see AGI in 2030 and 2031, respectively. * The heads of Anthropic and OpenAI think we’ll see it by 2027 and 2035, respectively. * A 2024 survey of AI researchers put a 50% chance of AGI by 2047, but this is 13 years earlier than predicted in the 2023 version of the survey. * These predictions seem feasible given the explosive rate of change we’ve been seeing in computing power available to models, algorithmic efficiencies, and actual model performance (e.g., look at how far Large Language Models and AI image generators have come just in the last three years). * Based on this, organisations (both new ones, like Forethought, and existing ones, like 80,000 Hours) are taking the prospect of near-term AGI increasingly seriously. What could AGI mean for animals? AGI’s implications for animals depend heavily on who controls the AGI models. For example: * AGI might be controlled by a handful of AI companies and/or governments, either in alliance or in competition. * For example, maybe two government-owned companies separately develop AGI then restrict others from developing it. * These actors’ use of AGI might be dr