Hide table of contents

It's different than passing someone's Intellectual Turing Test

Epistemic status: Others have found my take here useful, but I haven’t engaged with the entire debate on the subject. Fairly confident that it’s useful to me, though see caveat.

Eliezer Yudkowsky recently disclaimed steelmanning on the Lex Fredman podcast. I haven't listened — I heard it second-hand. Here's a thorough roundup of people saying steelmanning is bad.

This post will argue that steelmanning is a useful tool.

Background

A bit of obvious background that is nevertheless worth saying: I am not omniscient. Other people know things and have models that will help me achieve my goals better. I want to steal from them like a greedy man. I want to do this because I desperately want to be right about the world. Because it matters.

Yet it’s sometimes legitimately hard to steal from my interlocutor. They might be building their argument up from a different set of starting models than I am. This will make them explain their ideas in a way that’s kinda weird for me. Let’s see an example.

Dogs and cats

Suppose I don’t like dogs that much.[1] My friend notices I’m feeling sad and tells me I should get a dog.

I might think their idea is crazy at first blush. If I try to pass their Intellectual Turing Test, I might focus on things they said, about how joyful dogs are. I will find this pretty uncompelling. But if instead I focus on the part of their argument that seemed strongest to me, about how their dog lies next to them and makes the house feel less empty, then suddenly I have an insight about how I should get a cat.

Steelmanning vs ITT passing

ITTs are basically easier to understand than steelmans. They are very pure. Your goal in an ITT is to understand your interlocutor. For someone on the other side, it feels very good to have someone pass your ITT. You feel understood, obviously. You may also have them ask you, “Does that capture everything?”

Steelmanning is a more subtle concept. Most sources define it as “constructing the strongest possible version of” a view you disagree with. This begs the question: according to who? I claim: according to you, the steelman-er. It is you who are trying to improve your models.[2] If your interlocutor doesn’t like your steelman, that’s a sign you might have more to learn from them. But there’s no fundamental error going on.

Play with your steelman

I claim you should have fun with your interlocutor’s views. Make them your own, as much as possible. Worry less about fidelity, and simply follow the ideas where they take you. Think about what’s actually interesting to you about them. At the extreme, completely ignore the parts that seem bonkers to you, and focus on the parts that seem right. Ok, maybe do also see if there’s anything interesting in the bonkers parts, but only for as long as you feel like it.

Be clear about which thing is going on

If someone tells you they’re steelmanning you, don’t expect them to be perfectly representing your views. Think of it like a fanfiction of your idea. Feel free to talk to them about the ways you disagree with their steelman, but don’t expect an ITT.

When you’re talking to other people, if what you’ve done is a steelman, be clear that you can’t pass someone’s ITT, and instead this is your own spin.

When might you steelman vs ITT?

While live in your conversation, I would recommend adopting a more ITT frame of mind. Your interlocutor will appreciate it, and so you will get more out of the conversation. But afterwards, in the privacy of your own mind / google doc / conversation with your friend, go wild with your steelman. As a reviewer of this doc said, “ITT in the streets, steelmanning in the sheets.” Steelmans are what ultimately help me steal from people, if they’re starting from a different place from me.

That said, ITTs can lead to a stronger understanding of the other side’s position, which could help you steal from them as well. ITTs can make your eventual steelman stronger.

Caveat

Because ITTs are harder, I often stop at doing a steelman. If I was less of a steelman-stan, maybe I would do more of the ITT mental motion, which would result in more overall stealing knowledge and models from people.

Thanks to Justis Mills for reviewing a draft of this post.

  1. ^

     This is false, to be clear

  2. ^

     Here this post switches from a fair descriptive to a fairly prescriptive frame. I’d like to have flagged that somewhere other than a footnote, but oh well.

27

0
0

Reactions

0
0

More posts like this

Comments3


Sorted by Click to highlight new comments since:

I really like the frame of "fanfiction of your idea", I think it helpfully undermines the implicit status-thing of "I will steelman your views because your version is bad and I can state your views better than you can."

Yeah. Note that, in my culture, people can write fanfiction for media that they're not the biggest fans of. Like, they might see a core of a thing they like, and hate a lot of the rest, and still write a fic because they really want to explore that thing they liked more. Or they might really like it, but are adapting it because they like it so much they want to play with it!

Cool post. I had no idea that this debate over steelmanning existed. Glad to see someone on this side of it.

Lots of interesting subtleties in these posts, but the whole thing perplexes me as an issue of practical advice. It seems to me that parts of this debate suggest a picture of conversation where one person is actually paying close attention to the distinction between steelmanning and ITT and explicitly using those concepts to choose what to do next in the conversation. But I feel like I've basically never done that, despite spending lots of time trying to be charitable and teaching undergrads to do the same.

It makes me want to draw a distinction between steelmanning and ITT on the one hand and charity on the other (while acknowledging Rob's point that the word "charity" gets used in all kinds of ways). It's along the lines of the distinction in ethics between act types (lying, saying true things) and virtues (honesty): Steelmanning and passing ITT are the act types, and charity is the virtue.

The picture that emerges for me, then, is something like this: (1) We suppose a certain goal is appropriate. You enter a conversation with someone you disagree with with the goal of working together to both learn what you can from each other and better approach the truth. (2) Given that that's your goal, one thing that will help is being charitable -- meaning, roughly, taking ideas you don't agree with seriously. (3) Doing that will involve (a) considering the thoughts the other person is actually having (even at the expense of ignoring what seems to you a better argument along the same lines) as well as (b) considering other related thoughts that could be used to bolster the other person's position (even at the expense of straying to some degree from what the other person is actually thinking). Talking about ITT brings favorable attention to (a), and talking about steelmanning brings favorable attention to (b).

Yes, there is some kind of tradeoff between (a) and (b). Ozy's Marxist & liberal example nicely illustrates the dangers of leaning too hard in the (b) direction: In trying to present the other person's argument in the form that seems strongest to you, you end up missing what the other person is actually saying, including the most crucial insights present in or underlying their argument. Your dog & cat example nicely illustrates the dangers of leaning too hard in the (a) direction: In trying to get into your friend's exact headspace, you would again miss the other person's actual insight.

Given this picture, it seems to me the best practical advice is something like this: Keep that overall goal of collaborative truth-seeking in mind, and try to feel when your uncharitable impulses are leading you away from that goal and when your charity is leading you toward it. You certainly don't want to handle the tradeoff between (a) and (b) by choosing your loyalty to one or the other at the outset (adopting the rule "Don't steelman" or "Don't use the ITT"). And you don't even really want to handle it by periodically reevaluating which one to focus on for the next portion of the conversation. That's kind of like trying to balance on a skateboard by periodically thinking, "Which way should I lean now? Right or left?" rather than focusing on your sense of balance, trying to feel the answer to the question "Am I balanced?". You want to develop an overall sense of charity, which is partly a sense of balance between (a) and (b).

Curated and popular this week
Paul Present
 ·  · 28m read
 · 
Note: I am not a malaria expert. This is my best-faith attempt at answering a question that was bothering me, but this field is a large and complex field, and I’ve almost certainly misunderstood something somewhere along the way. Summary While the world made incredible progress in reducing malaria cases from 2000 to 2015, the past 10 years have seen malaria cases stop declining and start rising. I investigated potential reasons behind this increase through reading the existing literature and looking at publicly available data, and I identified three key factors explaining the rise: 1. Population Growth: Africa's population has increased by approximately 75% since 2000. This alone explains most of the increase in absolute case numbers, while cases per capita have remained relatively flat since 2015. 2. Stagnant Funding: After rapid growth starting in 2000, funding for malaria prevention plateaued around 2010. 3. Insecticide Resistance: Mosquitoes have become increasingly resistant to the insecticides used in bednets over the past 20 years. This has made older models of bednets less effective, although they still have some effect. Newer models of bednets developed in response to insecticide resistance are more effective but still not widely deployed.  I very crudely estimate that without any of these factors, there would be 55% fewer malaria cases in the world than what we see today. I think all three of these factors are roughly equally important in explaining the difference.  Alternative explanations like removal of PFAS, climate change, or invasive mosquito species don't appear to be major contributors.  Overall this investigation made me more convinced that bednets are an effective global health intervention.  Introduction In 2015, malaria rates were down, and EAs were celebrating. Giving What We Can posted this incredible gif showing the decrease in malaria cases across Africa since 2000: Giving What We Can said that > The reduction in malaria has be
Neel Nanda
 ·  · 1m read
 · 
TL;DR Having a good research track record is some evidence of good big-picture takes, but it's weak evidence. Strategic thinking is hard, and requires different skills. But people often conflate these skills, leading to excessive deference to researchers in the field, without evidence that that person is good at strategic thinking specifically. I certainly try to have good strategic takes, but it's hard, and you shouldn't assume I succeed! Introduction I often find myself giving talks or Q&As about mechanistic interpretability research. But inevitably, I'll get questions about the big picture: "What's the theory of change for interpretability?", "Is this really going to help with alignment?", "Does any of this matter if we can’t ensure all labs take alignment seriously?". And I think people take my answers to these way too seriously. These are great questions, and I'm happy to try answering them. But I've noticed a bit of a pathology: people seem to assume that because I'm (hopefully!) good at the research, I'm automatically well-qualified to answer these broader strategic questions. I think this is a mistake, a form of undue deference that is both incorrect and unhelpful. I certainly try to have good strategic takes, and I think this makes me better at my job, but this is far from sufficient. Being good at research and being good at high level strategic thinking are just fairly different skillsets! But isn’t someone being good at research strong evidence they’re also good at strategic thinking? I personally think it’s moderate evidence, but far from sufficient. One key factor is that a very hard part of strategic thinking is the lack of feedback. Your reasoning about confusing long-term factors need to extrapolate from past trends and make analogies from things you do understand better, and it can be quite hard to tell if what you're saying is complete bullshit or not. In an empirical science like mechanistic interpretability, however, you can get a lot more fe
Ronen Bar
 ·  · 10m read
 · 
"Part one of our challenge is to solve the technical alignment problem, and that’s what everybody focuses on, but part two is: to whose values do you align the system once you’re capable of doing that, and that may turn out to be an even harder problem", Sam Altman, OpenAI CEO (Link).  In this post, I argue that: 1. "To whose values do you align the system" is a critically neglected space I termed “Moral Alignment.” Only a few organizations work for non-humans in this field, with a total budget of 4-5 million USD (not accounting for academic work). The scale of this space couldn’t be any bigger - the intersection between the most revolutionary technology ever and all sentient beings. While tractability remains uncertain, there is some promising positive evidence (See “The Tractability Open Question” section). 2. Given the first point, our movement must attract more resources, talent, and funding to address it. The goal is to value align AI with caring about all sentient beings: humans, animals, and potential future digital minds. In other words, I argue we should invest much more in promoting a sentient-centric AI. The problem What is Moral Alignment? AI alignment focuses on ensuring AI systems act according to human intentions, emphasizing controllability and corrigibility (adaptability to changing human preferences). However, traditional alignment often ignores the ethical implications for all sentient beings. Moral Alignment, as part of the broader AI alignment and AI safety spaces, is a field focused on the values we aim to instill in AI. I argue that our goal should be to ensure AI is a positive force for all sentient beings. Currently, as far as I know, no overarching organization, terms, or community unifies Moral Alignment (MA) as a field with a clear umbrella identity. While specific groups focus individually on animals, humans, or digital minds, such as AI for Animals, which does excellent community-building work around AI and animal welfare while
Recent opportunities in Building effective altruism
46
Ivan Burduk
· · 2m read