For context, there are two ways models learn in reinforcement learning: exploration vs. exploitation.1 Every action a model takes has probability p of being random (exploration), and probability 1 - p of being the best possible action among known actions (exploitation). When the model has not been trained at all, when it knows nothing, that first action is pure exploration. As the model learns more, the probability of exploration decreases, while the probability of exploitation increases. This makes sense — if you want to maximize rewards, then you should both explore potentially better actions and exploit the best action you know of at a given time. The amount of effort you put into each depends on how much prior experience you have.

In complex environments, it makes sense to maintain a non-trivial probability of exploration. By complex environments, I mean environments in which it’s difficult to predict the downstream effects of actions. I think we live in a complex environment. My closest friends are people I met pretty randomly, without initial intention; many heuristics that make my life better are ones I stumbled upon randomly; the topics I am most interested in are ones I learned about randomly.2 Given that we live in a complex environment, it is good for us to explore regularly.

However, I know a handful of people who seem to only exploit. I’m talking about the productivity-maxxers — the friends who (think they) optimize every moment of their life, who structure every minute of their calendar, who reject social invitations because they just know their time would be better spent on something else. There is a kind of hubris to this attitude. These people might be optimizing on the knowns, but there is a vast space of unknowns beyond imagination.3 True optimization includes unstructured time — time spent on exploration, time spent being bored.

I find that exploration disproportionately leads to really really great outcomes. This is because unlike RL, humans are not randomly choosing among all possible directions to explore. Human exploration is led by one’s curiosity and instincts. When you solely exploit, you deprive yourself of the opportunity to follow your heart, to listen to the iron string inside. In this way, excessive exploitation suppresses the self.

Usually when people talk about someone who optimizes their life, they mean someone who is purely exploiting, who has become a vessel for the hacks, heuristics, and frameworks they use to structure their life. This person is too certain of their likes and dislikes to make first-impression-defying friendships; too sure they want to study CS to take any classes in any unrelated subject; too busy arbitraging the dating market to fall in love.

3

0
0

Reactions

0
0
Comments
No comments on this post yet.
Be the first to respond.
Curated and popular this week
Relevant opportunities