Is this trying to make a directional claim? Like people (in the EA community? in idealistic communities?) should on average be less afraid / more accepting of being morally compromised? (On first read, I assume no, it seems like just a descriptive post about the phenomenon). FWIW, I think it's worth thinking about the 2 forms of "compromise" separately. (Being associated with something you end up finding morally bad / directly doing something you end up finding morally bad). I think it's easier and more worthwhile to focus on avoiding the latter, but overall I'm not sure whether I've found a strong tendency that people overdo either of these things.
I don't think self-interest is relevant here if you believe that it is possible for an agent to have an altruistic goal.
Also, as with all words, "agentic" will have different meanings in different contexts, and my comment was based on its use when referring to people's behaviour/psychology which is not an exact science, therefore words are not being used in very precise scientific ways :)
Am not the author of this post, but I think EAs and rationalists have somewhat coopted the term "agentic" and infused it with a load of context and implicit assumptions about how an "agentic" person behaves, so that it no longer just means "person with agency". This meaning is transmitted via conversations with people in this social cluster as well as through books and educational sessions at camps/retreats etc.
Often, one of the implicit assumptions is that an "agentic" person is more rational and so pursues their goal more effectively, occasionally acting in socially weird ways if the net effect of their actions seems positive to them.
This was a great read! I relate to a lot of these thoughts - have swung back and forth on how much I want to lean into social norms / "guess culture" vs be a stereotypically rationalist-y person, and had a very similar experience at school. I think it's great you're thinking deeply and carefully about these issues. I've found that my attitude towards how to be "agentic" / behave in society has affected a lot of my major object-level decisions, both good and not so good.
Meta-level comment: this post was interesting, very well written and I could empathize with a lot of it, and in fact, it inspired me to make an account on here in order to comment : )
Object-level comment (ended up long, apologies!): My personal take is that a lot of EA literature on AI Safety (eg: forum articles) uses terminology that overly anthropomorphizes AI and skips a lot of steps in arguments, assuming a fair amount of prerequisite knowledge/ familiarity with jargon. When reading such literature, I try to convert the "EA AI Safety language" into "normal language" in my head in order to understand the claims better. Overall my answer to “why is AI safety important” is (currently) the following:
AI Safety literature commonly refers to AI’s as optimizers. Terms like “mesa-optimizer” are used a lot as well. There is also a fair amount of anthropomorphizing language such as “the AI will want to”, “the AI will secretly plan to”, and “the AI will like/dislike”. By now, I am ok with parsing such statements, but it can be confusing/distracting. “This thing is an optimizer” is a useful phrase when trying to predict the thing’s behavior, but looking through the lens of “what kind of optimizer is this” isn’t always the clearest way to look at the problem. The same can be said for the anthropomorphizing language. When someone says something like “so how do we check whether an AI secretly wants to kill you”, they are trying to summarise a phenomenon succinctly without giving the details/context, however, they are incurring the cost of being less precise and clear (and sounding weird).
Here’s my attempt to explain point 3 (As AI systems become more powerful it will be harder for us to ensure they do what we want them to do when deployed) without using EA-isms.
All this being said, I maintain a significant amount of skepticism around AI safety being the most important problem. I think nuclear risk and biorisk are very important and also more tractable. Based on my current model, I place a ~10% probability of AI-related existential risk in the next century, conditional on no other existential risk occurring beforehand. I think there is a much higher probability of non-existential risk, and I care about that too.
Currently, the most tractable way I can think of AI safety technical research is “let’s make ascertaining whether an AI output is good X% easier” and “let’s make it Y% easier to infer whether an AI is actually as good as it seems in training given limited training data and limited compute”. Research that increases X and Y would likely decrease AI risk. This is a hard problem, and many solutions may not be scalable to more powerful AI’s. This is likely why a lot of AI safety literature uses metaphors and language that seem far removed from current systems. I think this is an attempt to imagine a good map for the territory of future AI’s in order to come up with solutions that could work in the future.
I also have meta-level uncertainty around my takes and update in the direction of thinking that AI Safety is more important than I otherwise would because many intelligent people I respect think this. Because of this, I make decisions based on AI risk being higher than my internal model currently estimates. I also spend more time thinking about AI safety because I find it interesting and I have a somewhat suitable background (I like programming and ML). I do think a major factor in deciding whether to work on AI safety should be personal fit.