- It's sometimes reasonable to believe things based on heuristic arguments, but it's useful to be clear with yourself about when you believe things for heuristic reasons as opposed to having strong arguments that take you all the way to your conclusion.
- A lot of the time, I think that when you hear a heuristic argument for something, you should be interested in converting this into the form of an argument which would take you all the way to the conclusion except that you haven't done a bunch of the steps--I think it's healthy to have a map of all the argumentative steps which you haven't done, or which you're taking on faith.
- I think that all the above can be combined to form a set of attitudes which are healthy on both an individual and community level. For example, one way that our community could be unhealthy would be if people felt inhibited to say when they don't feel persuaded by arguments. But another unhealthy culture would be if we acted like you're a chump if you believe things just because people who you trust and respect believe them. We should have a culture where it's okay to act on arguments without having verified every step for yourself, and you can express confusion about individual steps without that being an act of rebellion against the conclusion of those arguments.
I wrote this post to describe the philosophy behind the schedule of a workshop that I ran in February. The workshop is kind of like AIRCS, but aimed at people who are more hardcore EAs, less focused on CS people, and with a culture which is a bit less like MIRI and more like the culture of other longtermist EAs.
Thanks to the dozens of people who I've talked to about these concepts for their useful comments; thanks also to various people who read this doc for their criticism. Many of these ideas came from conversations with a variety of EAs, in particular Claire Zabel, Anna Salamon, other staff of AIRCS workshops, and the staff of the workshop I’m going to run.
I think this post isn't really insightful enough or well-argued enough to justify how expansive it is. I posted it anyway because it seemed better than not doing so, and because I thought it would be useful to articulate these claims even if I don't do a very good job of arguing for them.
I tried to write the following without caveating every sentence with "I think" or "It seems", even though I wanted to. I am pretty confident that the ideas I describe here are a healthy way for me to relate to thinking about EA stuff; I think that these ideas are fairly likely to be a useful lens for other people to take; I am less confident but think it's plausible that I'm describing ways that the EA community could be different that would be very helpful.
Part 1: ways of thinking
Proofs vs proof sketches
When I first heard about AI safety, I was convinced that AI safety technical research was useful by an argument that was something like "superintelligence would be a big deal; it's not clear how to pick a good goal for a superintelligence to maximize, so maybe it's valuable to try to figure that out." In hindsight this argument was making a bunch of hidden assumptions. For example, here are three objections:
- It's less clear that superintelligence can lead to extinction if you think that AI systems will increase in power gradually, and before we have AI systems which are as capable of the whole of humanity we have AI systems which are as capable as dozens of humans.
- Maybe some other crazy thing (whole brain emulation, nanotech, technology-enabled totalitarianism) is likely to happen before superintelligence, which would make working on AI safety seem worse in a bunch of ways
- Maybe it's really hard to work on technical AI safety before you know more about the technology that will be used to build AGI than we currently know.
I think that all these objections are pretty reasonable, and I think that in fact there is a pretty good answer to all of them.
It seems like it in hindsight it worked out well that I was instantly credulous of the AI safety argument, given that ten years later I'm still convinced by it--I don't want to criticize myself for epistemic moves which empirically worked fine. But I think it was a mistake for me to not realize that I didn't have an end-to-end story for AI safety being important, I just had a sketch of an argument which was heuristically persuasive.
I'm reminded of the distinction between proofs and proof sketches in math--in a proof, you're supposed to take care of all the niggling details, while in a proof sketch you can just generally gesture at the kind of reason why something might be true.
I think it's correct to believe things when you can't spell out the whole argument for them. But I think it's good to be clear with yourself about when you're doing that as opposed to when you actually know the whole argument, because if you aren’t clear about that, you have problems like the following:
- You will be worse at reasoning with that argument and about that argument. By analogy, when I’m studying an intellectual subject like economics or math or biology, I’m constantly trying to prevent myself from having a false illusion of understanding of what I’m reading, because if I only have a fake understanding I won’t be able to apply it correctly.
- If you are in a conversation where that argument comes up, you might repeat the argument without understanding whether it’s relevant.
- If you hear a counterargument which should persuade you that the original argument is wrong, you might not realize that you should change your mind.
- If you talk to people about the argument and then turn out to not understand it, you’ll look like an arrogant and careless fool; this reflects badly on EA when it happens, and it happens often. (i am particularly guilty of having done this one.)
I think it's particularly healthy to sometimes try to think about the world in terms of end-to-end arguments for why what you're doing is good. By this I mean trying to backchain all the way from your work to good outcomes in the world. Sometimes I talk to people who are doing work that IMO won't be very helpful. I think that often they're making the mistake of not thinking about the end to end picture of how their work could be helpful. (Eg once I asked an AI safety researcher "Suppose your research project went as well as it could possibly go; how would it make it easier to align powerful AI systems?", and they said that they hadn't really thought about that. I think that this makes your work less useful.)
A key move here is the "noticing your confusion" move where you realize that an argument you believed actually has a hole in it.
Knowing where the "sorrys" are
Here's an obnoxious computer science metaphor.
I've spent a bit of time playing around with proof assistants, which are programs which allow you to write down mathematical proofs in a way that allows them to be automatically checked. Often when you're using them, you break down your proof into multiple steps. Eg perhaps you prove A, and that A implies B, and that B implies C, and then you join this all together into a proof of C. Or maybe you show that A is true if both B and C are true, and then you prove B and C and now you have a proof of A.
While you're in the middle of proving something, often you want to know whether the overall structure of your proof works before you have filled in all the details. To enable this, theorem provers give you a special keyword which you can use to tell the theorem prover "Please just pretend that I have successfully proven this little thing and then move on to checking other steps". In Lean, this keyword is called "sorry". To prove a really complicated thing, you might start out by having the whole proof be a sorry. And then you break down the problem into three steps, and you write sorry for each. Slowly you expand out the structure of your proof, using sorrys as you go as necessary, and then eventually you turn all of them into valid proofs.
I think that something like this might be a good metaphor for how you should relate to doing good in the world, or to questions like "is it good to work on AI safety". You try to write down the structure of an argument, and then fill out the steps of the argument, breaking them into more and more fine-grained assumptions. I am enthusiastic about people knowing where the sorrys are--that is, knowing what assumptions about the world they're making. Once you've written down in your argument "I believe this because Nick Bostrom says so", you're perfectly free to continue believing the same things as before, but at least now you'll know more precisely what kinds of external information could change your mind.
The key event which I think does good here is when you realize that you had an additional assumption than you realized, or when you realized that you'd thought that you understood the argument for X but actually you don't know how to persuade yourself of X given only the arguments you already have.
Small clarification: Many small arguments
In contrast to when you're doing mathematical proofs, when you're thinking about real life I often think that it's better to come to conclusions based on weighing a large number of arguments, rather than trying to make one complete calculation of your conclusion (see cluster thinking vs sequence thinking, or fox vs hedgehox mindsetf).
I structure a lot of my beliefs this way: I try to learn lots of different arguments that feel like they're evidence for various things, and I am interested in the validity of each argument, independent of whether it's decision relevant. So I often change my mind about whether a particular argument is good, while my larger scale beliefs shift more gradually.
Bonus miscellaneous points
Learning someone's beliefs, vs scrapping for parts
Two ways you can relate to some talk you're listening to:
- Learning their beliefs. You try to become able to answer the question "what would this person say about how useful it is to have EA-aligned people in various parts of the government"?
- Alternatively, you can scrap them for parts--you can try to take little parts of the things that they're saying and see whether you want to incorporate them into your personal worldview based on the individual merits of those little parts.
You shouldn't always do the latter, but (due to time constraints) you also shouldn’t always do the former, and it's IMO healthy to have a phrase for this distinction.
This is related to the CFAR-style looking-for-cruxes method of conversation. One really nice feature of the looking-for-cruxes style conversation is that it fails gracefully in the case where it turns out you're talking to someone smarter/more knowledgeable/better informed than you, which means that if we have a culture where we by default have conversations in a looking-for-cruxes style, it's less likely that smart people will be turned off EA by unpleasant conversations with overconfident EAs. (Thanks to Anna Salamon for this last point.)
Part 2: Outside views, deference, EA culture
I think we can use the above ideas to describe a healthy set of attitudes for the EA community to have about thinking about EA arguments.
Here are some tensions I am worried about:
- Some EAs know more and have thought more and better about various important questions than others--eg, EAs generally have better opinions when they've been around EA longer, when they have jobs that cause them to think about EA topics a lot or which expose them to private discussions about EA topics with people who work on them full time. It's often healthy to defer to the opinions of such people. But if you only defer, you don't practice thinking on your own, which is terrible because thinking on your own is the skill which EA requires in order to have their full timers have good opinions! And it also means that people are overly credulous of what fulltimer EAs think (or what people (potentially inaccurately) think that they think).
- When I was involved with Stanford EA in 2015, we spent a lot of time discussing core EA questions like the relative value of different cause areas, philosophical foundations, and what kind of strategies might be most valuable for EA to pursue for various goals. Most of us had a default attitude of skepticism and uncertainty towards what EA orgs thought about things. When I talk to EA student group members now, I don’t think I get the sense that people are as skeptical or independent-thinking.
- A lot of this is probably because EA presents itself more consistently now. In particular, longtermism is more clearly the dominant worldview. I think this makes things feel really different. In 2015, my friends and I were very uncertain about cause prioritization, and this meant that we were constantly actively reminded that it wasn’t possible that everyone was right about what to do, because they disagreed so much.
- Another factor here is that EA feels more to me now like it disapproves of people arguing publicly about cause prioritization. I have the sense that people would now view it as bad behavior to tell people that you think they’re making a terrible choice to donate to AMF--I feel much more restricted saying this nowadays, but this is at least partially just because I am personally now more risk averse about people thinking I’m obnoxious.
- I think that it’s potentially very bad that young EAs don’t practice skeptical independent thinking as much (if this is indeed true).
- On the other hand, one way that things have gotten much better is that I think it’s much more approachable to learn about AI safety than it used to be, because of things like the increasing size of the field, the Alignment Newsletter, the 80K podcast, and the increasing quality of explanations available.
- Also, if people are too inclined to defer and not think through arguments themselves, they might not just not assess the arguments themselves, they probably won’t even learn the arguments that the experts find persuasive.
- I want a culture where researchers try to think about whether the research they're doing is valuable. To encourage this, I want a culture where people are interested in trying to understand the whole end-to-end picture of what's important. But simultaneously I want it to be okay for someone to just work doing ops or whatever and not feel insecure about the fact that their models of the world aren't as good as the models of people whose full time job is to make good models.
- Similarly, I think that it’s very valuable for EAs to get status from doing actually useful stuff, as opposed to from being really good at arguing about what EA should be doing.
- I think it’s kind of tricky to have the right relationship to skepticism of established EA beliefs.
- One bad culture is one where people are embarrassed to ask questions and say that they don’t get the arguments for pieces of the conventional wisdom. We have a bunch of emperor’s-new-clothes-style dumb consensus beliefs, and we don’t spot holes in them. We don't get to practice noticing our confusion and improving our arguments.
- And when people who are new to EA talk to us, they notice that we don’t really understand the arguments for our beliefs, and so we turn off people who care the most about careful examination of claims. I think this is a pretty serious problem.
- But there's another bad culture where we can't update based on what other people think, or where we aren't supposed to believe things based on trusting other people. Or where it's considered low status to work on things that don't give you a mandate to think about the complete story.
I think that now that I have the above concepts, I can describe some features of what I want.
- I think it's much healthier if we have the attitude that in EA, people try to incrementally improving their understandings of things, and in particular they're interested in knowing which parts of their arguments are robust vs fragile.
- In this world, the default understanding is that when you change your mind about an argument about a subquestion, you aren't expected to immediately have an opinion about how this changes your mind about the main question.
- EAs are encouraged to try to build models of whatever parts of EA they’re interested in, and it’s considered a normal and good thing to try to think through arguments that you’ve heard and try to figure out if they make sense to you. But it’s clear that you’re not obligated to have models of everything.
- When asking questions of a prestigious, smart EA, people are interested in trying to understand what exactly the person thinks and how their beliefs are connected to each other, as opposed to just trying to learn their overall judgements or argue with them.
I wish I had better ideas for how to do EA movement building in ways that lead to a healthy EA culture around all these questions.
"EAs generally have better opinions when they've been around EA longer"
Except on the issues that EAs are systematically wrong about, where they will tend to have worse opinions. Which we won't notice because we also share those opinions. For example, if AMF is actually worse than standard aid programs at reducing global poverty, or if AI risk is actually not a big deal, then time spent in EA is correlated with worse opinions on these topics.
I mean on average; obviously you're right that our opinions are correlated. Do you think there's anything important about this correlation?
My broader point is something like: in a discussion about deference and skepticism, it feels odd to only discuss deference to other EAs. By conflating "EA experts" and "people with good opinions", you're missing an important dimension of variation (specifically, the difference between a community-centred outside view and a broader outside view).
Apologies for phrasing the original comment as a "gotcha" rebuttal rather than trying to distill a more constructive criticism.
Correlation usually implies higher value in sources of outside variance, even if the mean is slightly lower. We should actively look for additional sources of high-value variance. And we often see that smart people outside of EA often have valuable criticisms, once we can get past the instinctive "we're being attacked" response.
Epistemic status: grappling with something confusing. May not make sense.
One thing that confuses me is whether we should be just willing to "eat that loss" in expectation. I think most EAs agree that individuals should be somewhat risk-seeking in eg, career choice, since this allows the movement to have a portfolio. But maybe there are correlated risks that the movement will have (for example, if we're wrong about Bayesian decision theory, say, or meta-philosophy concepts like preferring parsimony), that we basically can't de-risk without cutting a lot into expected value.
An analogy is startups. Startups implicitly have to take on some epistemic (and other) risks about the value of the product, the vision for team organization being good, etc. VCs are fine with funding off-shoot ideas as long as their portfolio is good (lots of startups with relatively uncorrelated risks).
So maybe in some ways we should think of the world as a whole of having a portfolio of potential do-gooder social movements, and we should just try our best to have the best movement we can under the movements' assumptions.
Another analogy is the 100 schools of thought era in China, where at least one school of thought had important similarities to ours. That school of thought (Mohism) did not end up winning, for reasons that are not necessarily the best according to our lights. But maybe it was a good shot anyway, and if they compromised too much on their values or epistemology, they wouldn't have produced much value.
This is what confuses me when people like Will Macaskill talks about EA being a new ethical revolution. Should we think of an "EA ethical revolution" as something that is the default outcome as long as we work really hard at it, and is something we can de-risk and still do, or is the implicit assumption that we should think of ourselves as a startup that is one of the world's bets (among many) for achieving an ethical revolution?
I think one clear disanalogy with startups is that eventually startups are judged by reality. Whereas we aren't, because doing good and getting more money are not that strongly correlated. By just eating the risk of being wrong about something, the worst case is not failing, like it is for a startup, but rather sucking up all the resources into the wrong thing.
Also, small point, but I don't think Bayesian decision theory is particularly important for EA.
Anyway, maybe eventually this might be worth considering, but as it is we've done several orders of magnitude too little analysis to start conceding.
I've heard this impression from several people, but it's unclear to me whether EAs have become more deferential, although it is my impression that many EAs are currently highly deferential. It seems quite plausible to me that it is merely more apparent that EAs are highly deferential right now, because the 'official EA consensus' (i.e. longtermism) is more readily apparent. I think this largely explains the dynamics highlighted in this post and in the comments. (Another possibility is simply that newer EAs are more likely to defer than veteran EAs and as EA is still growing rapidly, we constantly get higher %s of non-veteran EAs, who are more likely to defer. I actually think the real picture is a bit more complicated than this, partly because I think moderately engaged and invested EAs are more likely to defer than the newest EAs, but we don't need to get into that here).
My impression is that EA culture and other features of the EA community implicitly encourage deference very heavily (despite the fact that many senior EAs would, in the abstract, like more independent thinking from EAs). In terms of social approval and respect, as well as access to EA resources (like jobs or grants), deference to expert EA opinion (both in the sense of sharing the same views and in the sense of directly showing that you defer to senior EA experts) seem pretty essential.
Relatedly, my purely anecdotal impression is basically the opposite here. As EA has professionalised I think there are more explicit norms about "niceness", but I think it's never been clearer or more acceptable to communicate implicitly or explicitly, that you think that people who support AMF (or other near-termist) probably just 'don't get' longtermism and aren't worth engaging with.
Here's what leads me to think EA seems more deferential now.
I spent a lot of time with the Stanford EA club in 2015 and 2016, and was close friends with many of the people there. We related to EA very differently to how I relate to EA now, and how most newer/younger EAs I talk to seem to relate to it.
The common attitude was something like "we're utilitarians, and we want to do as much good as we can. EA has some interesting people and interesting ideas in it. However, it's not clear who we can trust; there's lots of fiery debate about cause prioritization, and we just don't at all know whether we should donate to AMF or the Humane League or MIRI. There are EA orgs like CEA, 80K, MIRI, GiveWell, but it's not clear which of those people we should trust, given that the things they say don't always make sense to us, and they have different enough bottom line beliefs that some of them must be wrong."
It's much rarer nowadays for me to hear people have an attitude where they're wholeheartedly excited about utilitarianism but openly skeptical to the EA "establishment".
Part of this is that I think the arguments around cause prioritization are much better understood and less contentious now.
I feel like there are many fewer EA forum posts and facebook posts where people argue back and forth about whether to donate to AMF or more speculative things than there used to be.
I actually agree that there seems to have been some shift roughly along these lines.
My view is roughly that EAs were equally disposed to be deferential then as they are now (if there were a clear EA consensus then, most of these EAs would have deferred to it, as they do now), but that "because the 'official EA consensus' (i.e. longtermism) is more readily apparent" now, people's disposition to defer is more apparent.
So I would agree that some EAs were actually more directly engaged in thinking about fundamental EA prioritisation because they did not see an EA position that they could defer to at all. But other EAs I think were deferring to those they perceived as EA experts back then, just as they are now, it's just that they were deferring to different EA experts than other EAs. For example, I think earlier years many EAs thought that Giving What We Can (previously an exclusively poverty org, of course) and GiveWell, were the EA experts, and meanwhile there were some 'crazy' people (MIRI and LessWrongers) who were outside the EA mainstream. I imagine this perspective was more common outside the Bay Area.
Agreed, but I can't remember the last time I saw someone try to argue that you should donate to AMF rather than longtermism. I've seen more posts/comments/discussions along the lines of 'Are you aware of any EA arguments against longtermism?' Clearly there are still lots of EAs who donate to AMF and support near-termism (cause prioritisation, donation data), but I think they are mostly keeping quiet. Whenever I do see near-termism come up, people don't seem afraid to communicate that they think that it is obviously indefensible, or that they think even a third-rate longtermist intervention is probably incomparably better than AMF because at least it's longtermist.
This is an interesting possibility. I still think there's a difference. For example, there's a lot of disagreement within AI safety about what kind of problems are important and how to work on them, and most EAs (and AI safety people) seem much less inclined to try to argue with each other about this than I think we were at Stanford EA.
I think this is probably a mixture of longtermism winning over most people who'd write this kind of post, and also that people are less enthusiastic about arguing about cause prio these days for whatever reason. I think the post would be recieved well inasmuch as it was good. Maybe we're agreeing here?
I don't see people say that very often. Eg I almost never see people say this in response to posts about neartermism on the EA Facebook group, or on posts here.
Regarding deference, I think that it's important to make clear that this holds mostly for the EA community, rather than the EA network.
When someone who is not actively involved in the EA community but is nevertheless working on a specific recommended cause, especially a more established cause, it may well be the case that their view on the matter has less impact than their career success.
This is relevant to movement building. Work that aims at outreach can involve more deferential point of view when seeking to inform people who might be interested in paths in the EA network, or more inquisitive activities aimed at people who might consider themselves as part of the EA community.
"I think that it’s potentially very bad that young EAs don’t practice skeptical independent thinking as much (if this is indeed true)."
I agree that this is potentially very bad, but also perhaps difficult to avoid as EA professionalises, because you start needing more background and technical knowledge to weigh in on ongoing debates. Analogous to what happened in science.
On the other hand, we're literally interested the whole future, about which we currently know almost nothing. So there must be space for new ideas. I guess the problem is that, while "skeptical thinking" about received wisdom is hard, it's still easier than generative thinking (i.e. coming up with new questions). The problem with EA futurism is not so much that we believe a lot of incorrect statements, but that we haven't yet thought of most of the relevant concepts. So it may be particularly valuable for people who've thought about longtermism a bunch to make public even tentative or wacky ideas, in order to provide more surface area for others to cultivate skeptical thinking and advance the state of our knowledge. (As Buck has in fact done: http://shlegeris.com/2018/10/23/weirdest).
Example 1: a while back there was a post on why animal welfare is an important longtermist priority, and iirc Rob Wiblin replied saying something like "But we'll have uploaded by then so it won't be a big deal." I don't think that this argument has been made much in the EA context - which makes it both ripe for skeptical independent thinking, but also much less visible as a hypothesis that it's possible to disagree with.
Example 2: there's just not very much discussion in EA about what actual utopias might look like. Maybe that's because, to utilitarians, it's just hedonium. Or because we're punting it to the long reflection. But this seems like a very important topic to think about! I'm hoping that if this discussion gets kickstarted, there'll be a lot of room for people to disagree and come up with novel ideas. Related: a bunch of claims I've made about utopia. https://forum.effectivealtruism.org/posts/4jeGFjgCujpyDi6dv/characterising-utopia
I'm reminded of Robin Hanson's advice to young EAs: "Study the future. ... Go actually generate scenarios, explore them, tell us what you found. What are the things that could go wrong there? What are the opportunities? What are the uncertainties? ... The world needs more futurists."
See also: https://forum.effectivealtruism.org/posts/Jpmbz5gHJK9CA4aXA/what-are-the-key-ongoing-debates-in-ea
This seems like a deeper disagreement than you're describing. A lot of research in academia (ex: much of math) involves playing with ideas that seem poorly understood, trying to figure out what's going on. It's not really goal directed, especially not the kind of goal you can chain back to world improvement, it's more understanding directed.
It reminds me of Sarah Constantin's post about the trade-off between output and external direction: https://srconstantin.wordpress.com/2019/07/20/the-costs-of-reliability/
For AI safety your view may still be right: one major way I could see the field going wrong is getting really into interesting problems that aren't useful. But on the other hand it's also possible that the best path involves highly productive interest-following understanding-building research where most individual projects don't seem promising from an end to end view. And maybe even where most aren't useful from an end to end view!
Again, I'm not sure here at all, but I don't think it's obvious you're right.
This comment is a general reply to this whole thread.
Here's my summary of my position here:
Sometimes I talk to people who are skeptical of EA because they have a stronger version of the position you're presenting here--they think that nothing useful ever comes of people intentionally pursuing research that they think is important, and the right strategy is to pursue what you're most interested in.
One way of thinking about this is to imagine that there are different problems in a field, and different researchers have different comparative advantages at the problems. In one extreme case, the problems vary wildly in importance, and so the comparative advantage basically doesn't matter and you should work on what's most important. In the other extreme, it's really hard to get a sense of which things are likely to be more useful than other things, and your choices should be dominated by comparative advantage.
(Incidentally, you could also apply this to the more general problem of deciding what to work on as an EA. My personal sense is that the differences in values between different cause areas are big enough to basically dwarf comparative advantage arguments, but within a cause area comparative advantage is the dominant consideration.)
I would love to see a high quality investigation of historical examples here.
I mostly share your position, except that I think that you would perhaps maximize the probability of solving the Riemann hypothesis by going into paths on the frontline of current research instead of starting something new (but I imagine that there are many promising paths currently, which may be the difference).
This planners vs Hayekian genre of dilemmas seems very important to me, and it might be a crux in my career trajectory or at least impact possible projects I'm taking. I intuitively think that this question can be dissolved quite easily to make it obvious when each strategy is better, how parts of the EA world-view influences the answer and perhaps how this impacts how we think about academic research. There is also a lot of existing literature on this matter, so there might already be a satisfying argument.
If someone here is up to a (possibly adversarial) collaboration on the topic, let's do it!
The Planners vs Hayekian dillema seems related to some of the discussion in Realism about rationality, and especially this crux for Abram Demski and Rohin Shah.
Broadly, two types of strategies in technical AI alignment work are
Borrowing Vanessa's analogy of understanding the world as a castle, each floor built on the one underneath representing knowledge hierarchically built, when one wants to build a castle with unknown materials and unknown set of rules for it's construction with a specific tower top in mind, one can either start by building the groundwork well or by starting with some ideas of what can by directly below the tower top.
Planners start from the towers top, while Hayekians want to build a solid ground and add on as many well placed floors as they can.
I agree, and also immediately thought of pure mathematics as a counterexample. E.g., if one's most important goal was to prove the Riemann hypothesis, then I claim (based on my personal experience of doing maths, though e.g. Terence Tao seems to agree) that it'd be a very bad strategy to only do things where one has an end-to-end story for how they might contribute to a proof of the Riemann hypothesis. This is true especially if one is junior, but I claim it would be true even for a hypothetical person eventually proving the Riemann conjecture, except maybe in some of the very last stages of them actually figuring out the proof.
I think the history of maths also provides some suggestive examples of the dangers of requiring end-to-end stories. E.g., consider some famous open questions in Ancient mathematics that were phrased in the language of geometric constructions with ruler and compass, such as whether it's possible to 'square the circle'. It was solved 2,000 years after it was posed using modern number theory. But if you had insisted that everyone working on it has an end-to-end story for how what they're doing contributes to solving that problem, I think there would have been a real risk that people continue thinking purely in ruler-and-compass terms and we never develop modern number theory in the first place.
The Planners vs. Hayekians distinction seems related. The way I'm understanding Buck is that he thinks that, at least within AI alignment, a Planning strategy is superior to a Hayekian one (i.e. roughly one based on optimizing robust heuristics rather than an end-to-end story).
One of the strongest defenses of Buck's original claim I can think of would appeal specifically to the "preparadigmatic" stage of AI alignment. I.e. roughly the argument would be: sure, perhaps in areas where we know of heuristics that are robustly good to pursue it can sometimes be best to do so; however, the challenge with AI alignment precisely is that we do not know of such heuristics, hence there simply is no good alternative to having an end-to-end story.
For clarity, Terry Tao argues that it is a bad strategy to work on one open problem because one should skill up first, lose some naivety and get a higher status within the community. Not because it is a better problem solving strategy.
My reading is that career/status considerations are only one of at least two major reasons Tao mentions. I agree those may be less relevant in the AI alignment case, and are not centrally a criterion for how good a problem solving strategy is.
However, Tao also appeals to the required "mathematical preparation", which fits with you mentioning skilling up and losing naivety. I do think these are central criteria for how good a problem solving strategy is. If I want to build a house, it would be a bad strategy to start putting it together with my bare hands; it would be better to first build a hammer and other tools, and understand how to use them. Similarly, it would be better to acquire and understand the relevant tools before attempting to solve a mathematical problem.
I agree with this. Perhaps we are on the same page.
But I think that this is in an important way orthogonal to the Planner vs Hayekian distinction which I think is the more crucial point here.
I'd argue that if one wants to solve a problem, it would be better to have a sort of a roadmap and to learn stuff on the way. I agree that it might be great to choose subproblems if they give you some relevant tools, but there should be a good argument as to why these tools are likely to help. In many cases, I'd expect choosing subproblems which are closer to what you really want to accomplish to help you learn more relevant tools. If you want to get better at climbing stairs, you should practice climbing stairs.
I think having a roadmap, and choosing subproblems as close as possible to the final problem, are often good strategies, perhaps in a large majority of cases.
However, I think there at least three important types of exceptions:
Also, I of course acknowledge that there are limits to the idea of exploring subproblems that are less closely related. For example, I think no matter what mathematical problem you want to solve, I think it would be a very bad strategy to study dung beetles or to become a priest. And to be fair, I think at least in hindsight the idea of studying close subproblems will almost always appear to be correct. To return to the example of squaring the circle: once people had realized that the set of points you can construct with ruler and compass are closed under basic algebraic operations in the complex plane, it was possible and relatively easy to see how certain problems in algebra number theory were closely related. So the problem was less that it's intrinsically better to focus on less related subproblems, but more that people didn't properly understand what would count as helpfully related.
Regarding the first two types, I think that it's practically never the case and one can always make progress - even if that progress is in work done on analogies or heuristically relevant techniques. The Riemann hypothesis is actually a great example of that; there are many paths currently pursued to help us understand it better, even if there aren't any especially promising reductions (not sure if that's the case). But I guess that your point here is that this are distinct markers for how easy is it to make progress.
What is the alternative strategy you are suggesting in those exceptions? Is it to work on problems that are weakly related and the connection is not clear but are more tractable?
If so, I think that two alternative strategies are to just try harder to find something more related or to move to a different project altogether. Of course, this all lies on a continuum so it's a matter of degree.
I just looked up the proof of Fermat's Last Theorem, and it came about from Andrew Wiles spotting that someone else had recently proven something which could plausibly be turned into a proof, and then working on it for seven years. This seems like a data point in favor of the end-to-end models approach.
The paper Architecting Discovery by Ed Boyden and Adam Marblestone also discusses how one can methodologically go about producing better scientific tools (which they used for Expansion Microscopy and Optogenetics).
Yes, agree. Though anecdotally my impression is that Wiles is an exception, and that his strategy was seen as quite weird and unusual by his peers.
I think I agree that in general there will almost always be a point at which it's optimal to switch to a more end-to-end strategy. In Wiles's case, I don't think his strategy would have worked if he had switched as an undergraduate, and I don't think it would have worked if he had lived 50 years earlier (because the conceptual foundations used in the proof had not been developed yet).
This can also be a back and forth. E.g. for Fermat's Last Theorem, perhaps number theorists were justified in taking a more end-to-end approach in the 19th century because there had been little effort using then-modern tools; and indeed, I think partly stimulated by attempts to prove FLT (and actually proving it in some special cases), they developed some of the foundations of classical algebraic number theory. Maybe then people had understood that the conjecture resists attempts to prove it directly given then-current conceptual tools, and at this point it would have become more fruitful to spend more time on less direct approaches, though they could still be guided by heuristics like "it's useful to further develop the foundations of this area of maths / our understanding of this kind of mathematical object because we know of a certain connection to FLT, even though we wouldn't know how exactly this could help in a proof of FLT". Then, perhaps in Wiles's time, it was time again for more end-to-end attempts etc.
I'm not confident that this is a very accurate history of FLT, but reasonably confident that the rough pattern applies to a lot of maths.
Similar with what you're saying about AI alignment being preparadigmatic, a major reason why trying to prove the Riemann conjecture head-on would be a bad idea is that people have already been trying to do that for a long time without success. I expect the first people to consider the conjecture approached it directly, and were reasonable to do so.
Yes, good points. I basically agree. I guess this could provide another argument in favor of Buck's original view, namely that the AI alignment problem is young and so worth attacking directly. (Though there are differences between attacking a problem directly and having an end-to-end story for how to solve it, which may be worth paying attention to.)
I think your view is also born out by some examples from the history of maths. For example, the Weil conjectures were posed in 1949, and it took "only" a few decades to prove them. However, some of the key steps were known from the start, it just required a lot of work and innovation to complete them. And so I think it's fair to characterize the process as a relatively direct, and ultimately successful, attempt to solve a big problem. (Indeed, this is an example of the effect where the targeted pursuit of a specific problem led to a lot of foundational/theoretical innovation, which has much wider uses.)
I think you're interpreting me to say that people ought to have an externally validated end-to-end story; I'm actually just saying that they should have an approach which they think might be useful, which is weaker.
Thanks, I think this is a useful clarification. I'm actually not sure if I even clearly distinguished these cases in my thinking when I wrote my previous comments, but I agree the thing you quoted is primarily relevant to when end-to-end stories will be externally validated. (By which I think you mean something like: they would lead to an 'objective' solution, e.g. maths proof, if executed without major changes.)
The extent to which we agree depends on what counts as end-to-end story. For example, consider someone working on ML transparency claiming their research is valuable for AI alignment. My guess is:
"When you're thinking about real life I often think that it's better to come to conclusions based on weighing a large number of arguments, rather than trying to make one complete calculation of your conclusion"
I'm a little confused about this distinction. The process of weighing a large number of arguments IS a calculation of your conclusion, that's complete insofar as you've weighed all the relevant arguments. Perhaps you mean something like "A complete calculation that mainly relies on only a few premises"? But in this case I'd say the main advantage of the EA mindset is in fact that it makes people more willing to change their careers in response to a few fundamental premises. I think most AI safety researchers, for instance, have (or should have) a few clear cruxes about why they're in the field, whereas most AI researchers don't. Or perhaps you're just warning us not to think that we can make arguments about reality that are as conclusive as mathematical arguments?
I really enjoyed this. A related thing is about a possible reason why more debate doesn't happen. I think when rationalist style thinkers debate, especially in public, it feels a bit high stakes. There is pressure to demonstrate good epistemic standards, even though no one can define a good basis set for that. This goes doubly so for anyone who feels like they have a respectable position or are well regarded. There is a lot of downside risk to them engaging in debate and little upside. I think the thing that breaks this is actually pretty simple and is helped out by the 'sorry' command concept. If it's a free move socially to choose whether or not to debate (which avoids the thing where a person mostly wants to debate only if they're in the mood and about the thing they are interested in but don't want to defend a position against arbitrary objections that they may have answered lots of times before etc.) and also a free move to say 'actually, some of my beliefs in this area are cached sorries, so I reserve the right to not have perfect epistemics here already, and we also recognize that even if we refute specific parts of the argument, we might disagree on whether it is a smoking gun, so I can go away and think about it and I don't have to publicly update on it' then it derisks engaging in a friendly, yet still adversarial form debate.
If we believe that people doing a lot of this play fighting will on average increase the volume and quality of EA output both through direct discovery of more bugs in arguments and in providing more training opportunity, then maybe it should be a named thing like Crocker's rules? Like people can say 'I'm open to debating X, but I declare Kid Gloves' or something. (What might be a good name for this?)