Hide table of contents

or Optimizing Optimization

The definition of optimize is:

to make something as good as possible

It's hard to argue with that. It's no coincidence that a lot of us have something of an optimization mindset.

But sometimes trying to optimize can lead to worse outcomes (because we don't fully understand what to aim for). It's worth understanding how this happens. We can try to avoid it by a combination of thinking more what to aim for, and (sometimes) simply optimizing less hard.

Reflection on purpose vs optimizing for that purpose

What does the activity of making something as good as possible look like in practice? I think often there are two stages:

  1. Reflection on the purpose — thinking about what the point of the thing at hand is, and identifying what counts as "good" in context
  2. Optimizing for that purpose — identifying the option(s) which do best at the identified purpose

Both of these stages are important parts of optimization in the general sense. But I think it's optimization-for-a-given-purpose that feels like optimization. When I say "over-optimization" I mean doing too much optimization for a given purpose.

What goes wrong when you over-optimize

Consider this exaggerated example:

Alice is a busy executive. She needs to get from one important meeting to another in a nearby city; she's definitely going to be late to the second meeting. She asks her assistant Bob to sort things out. "What should I be optimizing for?", Bob asks. "Just get me there as fast as possible", Alice replies, imagining that Bob will work out whether a taxi or train is faster.

Bob is on this. Eager to prove himself an excellent assistant, he first looks into a taxi (about 90 minutes) and a train (about 60 minutes plus 10 minutes travel at each end — but there's a 20 minute wait for the right train). So the taxi looks better.

But wait. Surely he can do better than 90 minutes? OK, so the journey is too short for a private jet to make sense, but what about a helicopter? Yep, 15 minutes to get to a helipad, plus 45 minutes flight time, and it can land on the hotel roof! Even adding in 5 minutes for embarking/disembarking, this is 25 minutes faster.

Or ... was he assuming that the drivers were sticking to the speed limit? Yeah, if he make the right phone calls he can find someone who can drive door to door in 60 minutes. 

Can he get the helicopter to be faster than that? Yeah, the driver can speed to the helipad, and bring it down to 57 minutes. Or what if he doesn't have it take off from a helipad? He just needs to find the closest possible bit of land and pay the owners to allow it to land there (or pay security people to temporarily clear the land even if they don't have permission to land). Surely that will come in under 55 minutes. Actually, if he's not concerned about proper airfields, he can revisit the option of a private jet ... just clear the street outside and use that as a runway, then have a skydiving instructor jump with Alice to land on the roof of the hotel ...

What's going wrong here? It isn't just that Bob is wasting time doing too much optimization, but that his solutions are getting worse as he does more optimization. This is because he has an imperfect understanding of the purpose. Goodhart's law is biting, hard.

It's also the case that Bob has a bunch of other implicit knowledge baked into how he starts to search for options. He first thinks of taking a taxi or the train. These are unusually good options overall among possible ways to get from one city to the other; they're salient to him because they're common, and they're common because they're often good choices. Too much optimization is liable to throw out the value of this implicit knowledge.

So there are two ways Bob could do a better job:

  1. He could reflect more on the purpose of what he's doing (perhaps consulting Alice to understand that budget starts to matter when it's getting into the thousands of dollars, and that she really doesn't want to do things that bring legal or physical risk)
  2. He could do something other than pure optimization; like "find the first pretty good option and stop searching"[1], or "find a set of pretty good options and then pick the one that he gut-level feels best about"[2]
    • It's not obvious which of these will produce better outcomes; it depends how much of his implicit knowledge is known to his gut vs encoded in his search process

I'm generally a big fan of #1. Of course it's possible to go overboard, but I think it's often worth spending 3-30% of the time you'll spend on an activity reflecting on the purpose.[3] And it doesn't have much downside beyond the time cost.

Of course you'd like to sequence things such that you do the reflection on the purpose first ("premature optimization is the root of all evil"), but even then we're usually acting based on an imperfect understanding of the purpose, which means that more optimization for the purpose doesn't necessarily lead to better things. So some combination of #1 and #2 will often be best.

When is lots of optimization for a purpose good?

Optimization for a purpose is particularly good when:

  • You have a very clear understanding of the purpose, and won't get much if anything out of further reflection
    • e.g. if you're in the final of a chess competition competing for a big monetary prize, it's relatively clear that the objective should be "win the game", and you know all of the rules of the game
    • In general if you've thought a lot about your purpose and how Goodharting could lead to bad things and you don't feel too worried, it's a sign that a lot of optimization is a good idea!
  • You don't know how much different options vary in how well they perform at the purpose as you currently understand it
    • Not-optimizing plus scope insensitivity can be a bad recipe! Sometimes some options are hundreds of times better than others on crucial dimensions
  • You're using it as a process to generate possibilities, not committing to going with the top options even if they seem dumb

See also Perils of optimizing in social contexts for an important special case where it's worth being wary about optimizing.

  1. ^

    I owe this general point, which was the inspiration for the post, to Jan Kulveit, who expressed it concisely as "argmax -> softmax".

  2. ^

    This takes advantage of the fact that his gut is often implicitly tracking things, without needing to do the full work of reflecting on the purpose to make them explicit.

  3. ^

    As a toy example, suppose that every doubling of the time you spend reflecting on the purpose helps you do things 10% better; then you should invest about 12% of your time reflecting on purpose [source: scribbled calculation]. 

    Activities will vary a lot on how much you actually get benefits from reflecting on the purpose, but I don't think it's that unusual to see significant returns, particularly if the situation is complicated (& e.g. involving other people very often makes things complicated).

Comments7


Sorted by Click to highlight new comments since:

Lately, I tend to think of this as a distinction between the "proxy optmization" algorithm and the "optimality" of the actual plan. The algorithm: specify a proxy reward and a proxy set of plans, and search for the best one. You could call this "proxy optimization". 

The results: whatever actually happens, and how good it actually is. There's not really a verb associated with this - you can't just make something as good as it can possibly be (not even "in expectation" - you can only optimize proxies in expectation!). But it still seems like there's a loose sense in which you can be aiming for optimality.

Off the top of my head, there are a few ways proxy optimization can hurt, and most of them seem to come down to "better optimizing a worse proxy". You could deliberately alter the problem so that it is tractable for proxy optimization, you could just invest too much in proxy optimization vs trying to construct a good proxy. This seems to roughly agree with your advice: investing lots in proxy optimization is particularly beneficial when the proxy is already pretty good, or when it will reveal very large differences in prospective plans (which are unlikely to be erased by considering a better proxy). I actually feel that some caution might be needed in the setting where there are apparently many orders of magnitude between the value of different plans (according to a proxy) - something like, if the system is apparently so sensitive to the things you are taking into account, then there's reason to believe it might also be quite sensitive to the things you're not taking into account.

If you think of thinking as generating a bunch of a priori datapoints (your thoughts) and trying to find a model that fits those data, we can use this to classify some overthinking failure-modes. These classes may overlap somewhat.

  1. You overfit your model to the datapoints because you underestimate their variance (regularization failure).
  2. Your datapoints may not compare very well to the real-world thing you're trying to optimize, so by underestimating their bias you may make your model less generalizable out of training distribution (distribution mismatch).
  3. If you over-update on each new datapoint because you underestimate the breadth of the landscape (your a priori datapoints about a thing may be a very limited distribution compared to the thing-in-itself), you may prematurely descend into a local optima (greediness failure).

I'm afraid that I don't remember the specific name nor the specific formula (and a cursory Google search hasn't been able to job my memory), but there is also the concept within operations management of not optimizing a system too much, because that decreases effectiveness. If my memory serves, you can roughly think of it that if you are too highly optimized, your system is rigid/fragile and lacks the slack/flexibility to deal with unexpected but inevitable shocks.

I posted this to LessWrong as well, and one of the commenters there mentions the "performance / robustness stability tradeoff in controls theory". Is that the same as what you're thinking of?

Reminds me of the result in queueing theory, where (in the simplest queue model) going above ~80% utilisation of your capacity leads to massive increases in waiting time.

I'm glad you included the Tony Hoare/Donald Knuth quote about premature optimization. As soon as I saw the title of this post I was hoping there would be  at least some reference to that.

This is funny but I do not think that Goodhart's law is biting. Sometimes, measures can be optimal. Consider that all this Bob's 'suboptimal' development leads him to step back and see a better solution: the car driver is going to drive safely and Alice joins the introductions via videoconference. So, even being frantic about a metric, which is possibly not the ultimate target (the meeting going well?) can lead to a solution that meets this target, while also optimizes for the 'proxy' metric.

There is no need to explicitly reflect on purpose for this likely development toward an optimal solution. Actually, are you frantic about optimization for suboptimal optimization? I think so because perceived imposition may create inefficiencies. (Also take this as expressive writing.)

Curated and popular this week
 ·  · 38m read
 · 
In recent months, the CEOs of leading AI companies have grown increasingly confident about rapid progress: * OpenAI's Sam Altman: Shifted from saying in November "the rate of progress continues" to declaring in January "we are now confident we know how to build AGI" * Anthropic's Dario Amodei: Stated in January "I'm more confident than I've ever been that we're close to powerful capabilities... in the next 2-3 years" * Google DeepMind's Demis Hassabis: Changed from "as soon as 10 years" in autumn to "probably three to five years away" by January. What explains the shift? Is it just hype? Or could we really have Artificial General Intelligence (AGI)[1] by 2028? In this article, I look at what's driven recent progress, estimate how far those drivers can continue, and explain why they're likely to continue for at least four more years. In particular, while in 2024 progress in LLM chatbots seemed to slow, a new approach started to work: teaching the models to reason using reinforcement learning. In just a year, this let them surpass human PhDs at answering difficult scientific reasoning questions, and achieve expert-level performance on one-hour coding tasks. We don't know how capable AGI will become, but extrapolating the recent rate of progress suggests that, by 2028, we could reach AI models with beyond-human reasoning abilities, expert-level knowledge in every domain, and that can autonomously complete multi-week projects, and progress would likely continue from there.  On this set of software engineering & computer use tasks, in 2020 AI was only able to do tasks that would typically take a human expert a couple of seconds. By 2024, that had risen to almost an hour. If the trend continues, by 2028 it'll reach several weeks.  No longer mere chatbots, these 'agent' models might soon satisfy many people's definitions of AGI — roughly, AI systems that match human performance at most knowledge work (see definition in footnote). This means that, while the compa
 ·  · 4m read
 · 
SUMMARY:  ALLFED is launching an emergency appeal on the EA Forum due to a serious funding shortfall. Without new support, ALLFED will be forced to cut half our budget in the coming months, drastically reducing our capacity to help build global food system resilience for catastrophic scenarios like nuclear winter, a severe pandemic, or infrastructure breakdown. ALLFED is seeking $800,000 over the course of 2025 to sustain its team, continue policy-relevant research, and move forward with pilot projects that could save lives in a catastrophe. As funding priorities shift toward AI safety, we believe resilient food solutions remain a highly cost-effective way to protect the future. If you’re able to support or share this appeal, please visit allfed.info/donate. Donate to ALLFED FULL ARTICLE: I (David Denkenberger) am writing alongside two of my team-mates, as ALLFED’s co-founder, to ask for your support. This is the first time in Alliance to Feed the Earth in Disaster’s (ALLFED’s) 8 year existence that we have reached out on the EA Forum with a direct funding appeal outside of Marginal Funding Week/our annual updates. I am doing so because ALLFED’s funding situation is serious, and because so much of ALLFED’s progress to date has been made possible through the support, feedback, and collaboration of the EA community.  Read our funding appeal At ALLFED, we are deeply grateful to all our supporters, including the Survival and Flourishing Fund, which has provided the majority of our funding for years. At the end of 2024, we learned we would be receiving far less support than expected due to a shift in SFF’s strategic priorities toward AI safety. Without additional funding, ALLFED will need to shrink. I believe the marginal cost effectiveness for improving the future and saving lives of resilience is competitive with AI Safety, even if timelines are short, because of potential AI-induced catastrophes. That is why we are asking people to donate to this emergency appeal
 ·  · 1m read
 · 
We’ve written a new report on the threat of AI-enabled coups.  I think this is a very serious risk – comparable in importance to AI takeover but much more neglected.  In fact, AI-enabled coups and AI takeover have pretty similar threat models. To see this, here’s a very basic threat model for AI takeover: 1. Humanity develops superhuman AI 2. Superhuman AI is misaligned and power-seeking 3. Superhuman AI seizes power for itself And now here’s a closely analogous threat model for AI-enabled coups: 1. Humanity develops superhuman AI 2. Superhuman AI is controlled by a small group 3. Superhuman AI seizes power for the small group While the report focuses on the risk that someone seizes power over a country, I think that similar dynamics could allow someone to take over the world. In fact, if someone wanted to take over the world, their best strategy might well be to first stage an AI-enabled coup in the United States (or whichever country leads on superhuman AI), and then go from there to world domination. A single person taking over the world would be really bad. I’ve previously argued that it might even be worse than AI takeover. [1] The concrete threat models for AI-enabled coups that we discuss largely translate like-for-like over to the risk of AI takeover.[2] Similarly, there’s a lot of overlap in the mitigations that help with AI-enabled coups and AI takeover risk — e.g. alignment audits to ensure no human has made AI secretly loyal to them, transparency about AI capabilities, monitoring AI activities for suspicious behaviour, and infosecurity to prevent insiders from tampering with training.  If the world won't slow down AI development based on AI takeover risk (e.g. because there’s isn’t strong evidence for misalignment), then advocating for a slow down based on the risk of AI-enabled coups might be more convincing and achieve many of the same goals.  I really want to encourage readers — especially those at labs or governments — to do something