Goodharting Culture and Values Docs

Sam Smith 🔸

Crossposted from The Fieldbuilder's Blog

EXECUTIVE SUMMARY

Culture and Values docs have their place as living hypotheses about an organisation. When gaps between the narrated norms in the doc and the real workplace appear, these should be treated as information used to update the model of the org and respond accordingly. When this iterative modelling doesn’t happen, the culture and values doc can hold back an org’s conditions instead of furthering its mission.

To avoid the failure mode explored in this post, orgs should:

Frame the docs as explicitly imperfect approximations of a culture
As a team, revisit and update the doc according to real experience on a regular cadence e.g. have set culture and values days/ meetings, go on team retreats, etc.
When gaps between the docs and real culture of the org are surfaced explicitly treat them as datapoints about the org, not failures of individuals

INTRODUCTION

I love reading culture and values docs. They’re a nice chance to escape from the reality of an organisation and live in a fiction where everything functions smoothly and healthily.

Culture docs can serve a genuinely useful function by setting expectations for new hires by giving a model of an ideal version of the culture of the org. The problem is what happens when the doc goes from aspiration to description.

When leadership treats these docs as evidence that the culture is a certain way, rather than a target they’re actively working towards, they can become out of touch with the real issues the organisation is facing and not take the steps necessary to fix them. This is a Goodharting problem; the doc becomes the metric to optimise for rather than seeking to reflectively work on an organisation’s real culture.

My issue is specifically with organisations that adopt norms which depart from societal defaults without internalising the costs. Default professional norms aren’t arbitrary. They evolved partly to protect people with less power. Formal hierarchy clarifies accountability, professional distance creates boundaries, structured feedback processes create safety rails. When an org removes these in favour of something supposedly better but doesn’t build replacement systems to cover what was lost, it creates danger. And that danger isn’t distributed evenly.

The erosion of default norms disproportionately benefits people in power, while the cost is mostly absorbed by those in more vulnerable positions like junior staff, people on fixed-term contracts, members of marginalised groups. When this goes wrong repeatedly, it doesn’t just decrease the org’s impact, it can damage trust across the whole community and affect how the ecosystem is perceived.

I’m writing this because the work of AI Safety organisations is hugely important, and poor cultural norms lead to real counterfactual impact lost. It’s tempting to focus on the exciting big picture (new research directions, fellowship structures, hires, etc.) but the day to day culture is what enables the org to function well or not. A single doc won’t fix that.

CULTURE

Culture, defined as “the way we do things here”, is an amalgamation of the organisation’s mission, individual worldviews, relationships and work styles that make up the staff and other stakeholders who interact with the organisation.

Sure having values to aim for and training/discussion can help individuals target a way of doing things, but by no means does this guarantee the manifestation of these values. However, the intended culture only arises when the values are consistently acted out in practice to create norms which embody those values.

Reading the scout mindset alone does not make you open minded.

Attending a lecture on productive disagreements does not make you feel psychologically safe.

Saying you are x, y or z does not necessarily mean living out those values in reality.

This is Goodharting. When the doc is a proxy for the culture, organisations are no longer improving on the way they do things, they are iterating off a set of norms that may not exist within the org.

Outside of goodharting, clinging to these culture and values docs can lead to people ignoring context which is necessary to understand someone’s actions or behaviours. This can lead to leaders evaluating and making decisions about individuals and overall strategy without key aspects of the full picture.

EXAMPLES

This gap between real culture and doc can play out with many commonly adopted values in the AIS ecosystem.

Many of these values could be great if practiced sustainably. However, the move away from traditional professional norms can bring danger which must be internalised proactively or lead to real harms, potentially more so than the harms the new norms were trying to fix.

PSYCHOLOGICAL SAFETY

Psychological Safety is an idea coined by Amy Edmondson that in an environment staff feel safe to take interpersonal risks (e.g. vulnerability, giving critical feedback, etc.).

In theory, this would be incredibly valuable for a team, allowing them to function more as a unit more efficiently. However, claiming that a team’s culture includes psychological safety without that actually being felt by the team can lead to insidious consequences in practice:

Psychological Safety on a culture and values doc means that anyone can attempt to speak their mind, be told to feel heard and valued and not feel offended when their ideas are ignored, shut down or ridiculed.

Psychological Safety on a culture and values doc means repeated patterns can get overlooked since no harm was “actually” caused.

Psychological Safety on a culture and values doc means that potentially insidious power dynamics can hide behind the veil of flat seniority where leaders don’t actually need to consider their imposing role or the limitations they exert on the “peers” they work with.

Psychological Safety on a culture and values doc means we can all role play a world where mistakes don’t get you a lost paycheck, where leadership doesn’t make secret plans for your future without you in the loop.

The idea of psychological safety can be used as evidence against a person’s real issues with the team. When psychological safety is assumed as a default, the org chooses to focus on the Culture and Values doc over the real employee who is part of the actual culture.

RADICAL TRANSPARENCY

Many people in the more startupy culture of AIS orgs are reasonably frustrated by bureaucracy and barriers to information. As such, lots of these orgs value “radical transparency”, the norm of rejecting traditional boundaries by sharing strategy, information and other traditionally privileged information with everyone on the team.

In my experience, leaders can take for granted transparency when they have a top-down overview of the organisation, even if others have no idea what’s going on. This can lead to assumed context and poor management where the leaders think people know what’s happening.

Ironically, in trying to reduce barriers to information, “radical transparency” can end up adding friction. When a junior person has to ask a question that leadership assumes everyone already knows the answer to, the ask itself feels like an admission of incompetence.

A one-way window is transparent to the one looking in.

MANAGING UPWARDS

Another rejection of traditional bureaucracy comes in the norm of “managing upwards.” This is the idea of inverting the power dynamic of management structures where the managee can give critical feedback to the manager and point out dropped balls.

This is another example where in a psychologically safe team, having the license to help your manager be more effective means everyone works better together towards the organisation’s mission.

However, when managing upwards is the default expectation, onus is taken off of the manager to do a good job and places it on the person they’re supposed to be supporting to pick up their mistakes.

When the managee has to unblock themself by teaching their manager how to manage, something about the structure of the org is not working. Inverting the power dynamic misses the point of managers existing in the first place.

ASSUMING GOOD INTENT

Everyone has bad days and slips up sometimes. Most people in these spaces are here because they genuinely care about the mission and want an organisation to succeed. As such, lots of orgs promote the principle of charity where you always assume good intent, especially in cases where someone does something that leads to a bad outcome.

The principle of charity is great for having constructive conversations and digging into disagreements. But when this is taken to the limit, it can lead to real problems.

For example, if the response to repeated problematic behaviour is to “assume good intent” nothing gets fixed. Instead there actually is no problem because no one has intentionally done anything wrong.

When patterns are spotted and raised in good faith, but the issue is not resolved, intent is no longer the problem. The impact of the organisation is being undermined because of someone’s actions. That needs to be actually addressed and managed rather than deflecting the harm because someone meant well.

SELF-AWARENESS PROBLEM

A lot of the problem here boils down to organisations, especially the leaders of such organisations, using the values and culture doc as a distorted view of the actual norms of the organisation.

In a very frustratingly ironic way, people who value self awareness and scout mindset fall especially victim to a vicious cycle of believing they are self aware. Since their identity is predicated on this idea of truth seekingness, it doesn’t compute that they actually may not be all that truth seeking.

For example, having an anonymous feedback form does not instantly make you better at internalising feedback and improving off of it. What it can do is give a defense against needing to actually reflect and grow because you have a token of this.

These practices can become an armour against the feedback they claim to be eliciting, making the leader further out of touch with every token of openness they collect.

This can lead to leadership making decisions based on a model of an organisation that the actual employees may not even recognise.

CULTURE IS MAINTENANCE, NOT DECLARATION

Culture is an emergent property of the way individuals act within an organisation and how that web of people relate to one another. The culture can be influenced by the expectations laid out in a doc, which can get people on the same page about what may be expected in general. However, any doc is only an approximation of the actual culture and cannot capture the complicated web of dynamics and behaviours that naturally come from different people relating and working together.

Culture and values docs can present this emergent property as a clearly defined thing that can keep the actual cultural issues unsurfaced. Furthermore, values can privilege the person in power who does not actually have the same cultural experience as the other staff because of the intrinsic power dynamic.

The answer isn’t to stop making these docs. It’s to stop treating them as a static view of a finished culture. Healthy culture is a practice and like any practice it must be maintained consistently.

These docs should be treated more as living hypothesis about the organisation tested against real experience, not defended against it. Gaps between the doc and the real culture should be treated as added data rather than failures of the staff. Falsification should be treated as an improvement of the model leading to an update of the doc.

This requires mechanisms for honest feedback like anonymous forms, dedicated space in meetings and regular check-ins. But these mechanisms alone aren’t enough and run the risk of being tokenistic without real change. Visible follow-through is necessary to strengthen the mechanisms through trust in the process. If feedback disappears into a void, the mechanism erodes and with it, the culture’s ability to self-correct.

Effective Altruism Forum
EA Forum