"Part one of our challenge is to solve the technical alignment problem, and that’s what everybody focuses on, but part two is: to whose values do you align the system once you’re capable of doing that, and that may turn out to be an even harder problem", Sam Altman, OpenAI CEO (Link).
In this post, I argue that:
1. "To whose values do you align the system" is a critically neglected space I termed “Moral Alignment.” Only a few organizations work for non-humans in this field, with a total budget of 4-5 million USD (not accounting for academic work). The scale of this space couldn’t be any bigger - the intersection between the most revolutionary technology ever and all sentient beings. While tractability remains uncertain, there is some promising positive evidence (See “The Tractability Open Question” section).
2. Given the first point, our movement must attract more resources, talent, and funding to address it. The goal is to value align AI with caring about all sentient beings: humans, animals, and potential future digital minds. In other words, I argue we should invest much more in promoting a sentient-centric AI.
The problem
What is Moral Alignment?
AI alignment focuses on ensuring AI systems act according to human intentions, emphasizing controllability and corrigibility (adaptability to changing human preferences). However, traditional alignment often ignores the ethical implications for all sentient beings. Moral Alignment, as part of the broader AI alignment and AI safety spaces, is a field focused on the values we aim to instill in AI. I argue that our goal should be to ensure AI is a positive force for all sentient beings.
Currently, as far as I know, no overarching organization, terms, or community unifies Moral Alignment (MA) as a field with a clear umbrella identity. While specific groups focus individually on animals, humans, or digital minds, such as AI for Animals, which does excellent community-building work around AI and animal welfare while
My biggest takeaway from EA so far has been that the difference in expected moral value between the consensus choice and its alternative(s) can be vastly larger than I had previously thought.
I used to think that "common sense" would get me far when it came to moral choices. I even thought that the difference in expected moral value between the "common sense" choice and any alternatives was negligible, so much so that I made a deliberate decision not to invest time into thinking about my own values or ethics.
EA radically changed my opinion, and now I hold the view that the consensus view is frequently wrong, even when the stakes are high, and that is possible to make dramatically better moral decisions by approaching them with rationality, and a better-informed ethical framework.
Sometimes I come across people who are familiar with EA ideas but don't particularly engage with them or the community. I often feel surprised, and I think the above is a big part of why. Perhaps more emphasis could be placed on this expected moral value gap in EA outreach?
I've found not many people bother to play arbitrage with prosocial outcomes.
You essentially need someone to care about prosocial outcomes, be very quantitative and strigent with calculations vs just going by concensus, and be sufficiently motivated to make life changes. In a way, being agreeable to care about others while being disagreeable enough to go against social concensus and gut feel.
Early adopters get to play a lot of arbitrage.
I suffer strongly from the following, and I suspect many EAs do too (all numbers are to approximations to illustrate my point):
I'm still figuring out what to do about this. When you're highly uncertain it's obviously fine to hedge against being wrong, but again, given the numbers it's hard to justify hedging all the way down to inaction.
I am trying to learn more about AI safety, but I'm not spending very much time on it currently. I'm trying to talk to others about it, but I'm not evangelising it, nor necessarily speaking with a great sense of urgency. At the moment, it's low down my de factor priority list, even though I think there's a significant chance it changes everything I know and care about. Is part of this a lack of visceral connection to the risks and rewards? What can I do to feel like my values are in line with my actions?
...What on earth does "90% probability, with medium confidence" mean? Do you think it's 90% likely or not?
It means something like "my 90% confidence interval is 80% - 95%, with 90% as the mean".
Your “90% confidence interval” of… what, exactly? This looks like a confidence interval over the value of your own subjective probability estimate? And “90% as the mean” of… a bunch of different guesses you’ve taken at your “true” subjective probability? I can't imagine why anyone would do that but I can’t think what else this could coherently mean…?
If I can be blunt, I suspect you might be repeating probabilistic terms without really tracking their technical meaning, as though you’re just inserting nontechnical hedges. Maybe it’s worth taking the time to reread the map/territory stuff and then run through some calibration practice problems while thinking closely about what you’re doing. Or maybe just use nontechnical hedges more, they work perfectly well for expressing things like this.
Thanks for the feedback - it has indeed been a long time since I did high school statistics!
I specified that the numbers I gave were "approximations to prove my point" is because I know that I do not have a technical statistical model in my head, and I didn't want to pretend that was the case. Given this is a non-technical, shortform post, I thought it was clear what I meant - apologies if that wasn't so.
Skill up and work on technical AI safety! Two good resources: 1, 2. Even if you don't yet feel the moral urgency, skilling up in ML can put you in a better position to do technical research in the future.
Thanks for the suggestion! I have actually spent quite a lot of time thinking about this - I had my 80k call last April and this was their advice. I've hesitated against doing this for a number of reasons:
There are probably good rebuttals to at least some of these points, and I think that is adding to my confusion. My intuition is to keep doing what I'm currently doing, rather than go try and learn ML, but maybe my intuition here is bad.
Edit: writing this comment made me realise that I ought to write a proper doc with the pros/cons of learning ML and get feedback on it if necessary. Thanks for helping pull this useful thought out of my brain :)
Thanks for sharing this. I thought point 7 was well-put, and something I relate to (and could apply to several other cause areas/problems)
Hey,
My favorite tool for resolving internal conflict like this (which also resonated with many people I spoke to) is "internal double crux".
TL;DR: Write down what you different parts want to say (and respond) to each other (as opposed to writing a list of something-like-pros-and-cons)
If you want, we can talk and do it together, and see if it works for you
This is a good idea, thanks for the suggestion! I've never really tried any of the CFAR stuff but this seems like a good place to start.
I'll give it a go over the weekend and if I'm struggling then I'll let you know and we can do it together :)
:)
Some quick thoughts following EAG London 2023
TL;DR
Last year I attended EAG SF (but not EAG London 2022), and was newer to the EA community, as well as working for a less prestigious organisation. This context is probably important for many of these reflections.
Conference strategy
I made some improvements to my conference planning strategy this year, that I think made the experience significantly better:
I had a clearer sense of what I was trying to achieve out of the conference this year compared to last. This made it easier to decide who would be valuable to speak to. Everyone who I requested a meeting with accepted, including someone who I regarded as particularly impressive who had specified they weren't going to take many 1:1s - so have a low bar for requesting meetings! With this person in particular I felt a bit starstruck, and regretted not having spent more time preparing specific questions to ask.
Some other meetings were totally fine without preparation though, so I think it's worth considering which ones would be more valuable to prepare for - in my case these were ones with more accomplished folks, or with people who I'm interested in working/collaborating with in the near future.
I broadly had two kinds of 1:1s, both of which were valuable: meetings with people who had a track record in areas that I am considering pursuing, and meetings with people who are in a similar position to me currently. With the former, I had specific questions that I was trying to answer, and was potentially trying to impress them/gauge if they might hire me. With the latter, meetings were more exploratory, more casual, and more about trying to find out if I was missing anything important from my model. I think it's easy to feel like the former is far more valuable than the latter, but I think this is false, and I think scheduling some of these more relaxed meetings can help ease the stress of the conference (as well as provide valuable insight into your current situation and plans).
I've infiltrated the ingroup
Last year, I felt like I had something to prove in all of my meetings. I knew maybe 4-5 people who were at the conference, which is not a lot out of almost 1500, and I worked for a company that nobody in SF has heard of. I was still a prototypical EA (straight white male) in many ways, but I don't have an undergrad, and felt like I lacked any track record of achievements to show that I was competent (and by extension, worthy of other attendees' time).
This year, I had several close friends attending and had interacted with a much larger number of people in some capacity, either on social media or through EA tech events in London. I now work for a FAANG company. I have read the seminal Slate Star Codex posts, I broadly understand what a deceptively aligned mesa optimiser is, and I've speculated as to the true identity of Qualy the Lightbulb on Twitter. I have a legible track record of my competence, and I am fluent in EA jargon. I have infiltrated the ingroup.
I feel very conflicted about this - being part of the ingroup feels great. I feel like I have the respect of people who I view as being extremely talented and successful, and naturally this does wonders for my ego. Having so many common touchpoints has meant that I find it extremely easy to make meaningful connections within the EA community compared with outside. Using jargon to signal that your familiar with the relevant scriptures and ideas lets you skip straight to discussing cruxes, on the understanding that both parties already agree on some number of issues. Other group norms, such as a preference for openness, directness, and epistemic humility also seem better than their alternatives to me - I think these tendencies facilitate more constructive discussion, ultimately (hopefully) leading to greater impact.
But there are many obvious drawbacks to this. The ingroup is primarily made up of exceptionally privileged people (and in some ways I'm glad of this - I want privileged people like myself to be doing more to think about how they can do good with their privilege), and often the people that feel less comfortable around the EA community are less privileged. Other people have articulated these problems with the community better than me, and I don't want to speak for them so won't go into too much detail, but if you're reading this then you probably already know exactly what I mean. The fact that the EA community makes some people feel this way makes me feel really sad, which is hard to square with the happiness I get from the sense of belonging that I personally feel from the community.
I'm not entirely sure what to do about this. Trying to use less jargon seems like a good start, but that would be a costly signal for me, particularly when I feel like I am starting to gain status within the community. I would love to write that I am happy to sacrifice this, but I am human and flawed, and I don't know if I am. Making a conscious effort to treat people the same, regardless of whether I view them as highly accomplished or not, also seems like something I ought to write here, but seems hard (or even impossible) to do in practice.
80k has the idea of spending the initial part of your career acquiring career capital, which can later be traded in for impact. Perhaps EA ought to have the idea of community capital - a greater focus is placed on nurturing and growing a healthy community focused on doing good. Even if this focus detracts from doing the absolute most good possible in the short term, in the long term it could allow for greater impact (and, after all, we do love talking about the long term round here).
I feel like my thinking on these topics is generally pretty immature and lacks nuance, so I'd welcome any thoughts.
Outcomes
To end on a positive note, I found the conference incredibly valuable. I came away feeling like I had three promising paths to explore, and am now planning on trying to move to direct work immediately, rather than gaining more career capital as originally planned. I feel excited about trying to use my career to do good, and I feel excited about EA as a whole. There are jobs I will apply for that I counterfactually wouldn't have, seemingly promising opportunities for collaboration, and I made several new contacts that I think will be mutually beneficial in the future.
The conference itself seemed excellently run, the venue was amazing, as was the food, and I feel very appreciative for both the organisers and event staff for all their time and effort!
Earlier I had a conversation with Yonatan Cale about, among other things, ideas for EA projects that could be looking for founders. My prior is that "ideas are easy, execution is hard" and therefor there are plenty of good ideas, he pushed back on this and cited this thread.
Then I went for a run and tried to think of some ideas. I haven't checked to see if anyone has proposed any of them before, and given that I thought of them off the top of my head, they have likely already been discussed. We had talked about software projects specifically, but not all of these are software-centric. I haven't spent longer than five minutes thinking about any of them, and I think it's unlikely any of them are particularly good; this is an exercise in generating ideas, on the grounds that if there's enough of them, one of them might be good.
git clone polis
. Low chance of success, potential for huge impact if successful regardless.Obviously I know the last two definitely don't have legs, and #1 seems like it might just be submitting a pull request after a weekend or two of work, but still. I'm confident that there is a non-zero amount of value across all of these ideas, that if I thought about them more there's a possibility that they could yield an appreciable amount of value, and that I can think of a large number of similar ideas given more time, with at least some of them being better than the best one of these ideas.