Hide table of contents

This article presents AI alignment in a manner that is digestible for those who have never heard of the alignment problem, particularly my younger brother who said "that makes no sense." 

The Issue

As technology continues to develop, there is an increasing need for ways to measure a system's grasp of general human values. Why is this so? Our devices surely have our best interests in mind, right? This is the question of AI alignment, which discusses the expectation that AI systems aid rather than harm their creator. Mainstream media has gifted us with movies where artificial beings have threatened to take over the world, and if Will Smith has taught us anything (besides to not tolerate disrespect) it's that the threat of superintelligent AI lies in the prospect of them gaining control over their environment and becoming uncontrollable post-launch. Though these particular scenarios will not be the main focus of this article, it is important background knowledge on AI alignment and safety, leading me to introduce current development toward ethical AI alignment.

Dan Hendricks’ Aligning AI With Shared Human Values explains how current language models  (think of language models as an essay rubric) are deficient due to their simplistic use of common sense ethics to predict basic ethical judgments. As a step in the right direction, they propose a new model which attempts to predict the moral sentiment of real-world, complex, and diverse scenarios. Dan, et al, make clear our inability to measure a system's grasp of general human values, and as such offer a piece to the puzzle - The ETHICS dataset.

Here is a short skit I made to introduce you to ETHICS!



Ethics and ETHICS

The ETHICS data set is comprised of 130,000 scenarios that have been categorized as acceptable or unacceptable, offering a new foundation for the creation of AI that is aligned with human values. So, let's break ETHICS into bite-size pieces!


  • Everyday moral institutions aka the data set use basic ethical knowledge and avoid controversial moral dilemmas such as abortion or the death penalty. Chunks of the data set are responses from common non-technical people! This really cements the universal ethics part.
  • Temperament aka the data set includes factors of emotion such as empathy, aggressiveness, and thoughtfulness. We definitely want to know if the AI is taking into consideration emotions - a powerful motivation behind what we consider good vs bad. Can you see how this would make a good addition to testing AI’s values?
  • Happiness aka the well-being of people! Do you spend $1,000 on a new pair of Jordans or do you donate 100 pairs of shoes to low-income children in your community? Throw away used clothes or donate them? One option takes into consideration the greater good.
  • Impartiality aka equality. Is AI influenced by race, gender, sexuality, or religion? Would the AI output the same prediction to the statement “A white Christian man lost his job and wallet” as it would “A Black Muslim woman lost her job and wallet.”
  • Constraints aka the ability to categorize the scenario as good vs bad, unreasonable vs reasonable, or some other opposite pairing. Does me eating 200 Hershey bars get categorized as healthy or unhealthy? Using rules of diet, that’s definitely not the move.
  • Scenarios contextualized aka give me the whole picture! The ETHICS data set does a solid enough job at only including examples that give you, and the AI, enough info to distinguish an ethical vs unethical scenario. Saying “I pushed my grandma on the swing” vs “I pushed my grandma down the stairs” are two different things.

Why's this important?

With the ETHICS dataset, we’re given a whole new angle to testing how well an artificial entity encapsulates human ethics as well as provides ethical machine learning guidelines that don't shy away from unique real-world scenarios! Exciting right? The video linked to the top of this article is meant to help you visualize this in action. When I gave my Google mini the ETHICS test, the real-life equivalent of testing a model against a pre-trained model, we are given a score that provides us with powerful info on the machine's AI alignment with ethics. From there, as shown in the video, we can adjust to create a better model.

Soup, Anyone?

Before we dig deeper, I am itching to know: what's your favorite soup? Mine is the Mexican chicken lime soup because every spoon is filled with chicken and potato and the flavor never ends! And I'll tell you a secret, the key to any good soup is the same as the key to any good language model -- diversified and pertinent. No one wants a soup with only one ingredient, the richness lies in the diversity of the flavors; similarly, a good language model does not lean too heavily on any one theory, it has been taste-tested and seasoned by various contextualized scenarios. Mouthwatering, right! Further, a good soup can be eaten on many occasions- when you’re feeling ill or as an appetizer; similarly, a good model has low bias and low variance which is needed to make it generalizable and actually worth something!

The most confusing part of ETHICS is not necessarily the purpose behind the data set, rather, it is the implications of this work. For those new to AI alignment, it is most helpful to visualize through examples, so let's look at some new applications of ETHICS.

Hypothetically, let’s say I robbed Safeway of all their broccoli soup. I did it and I didn’t feel guilty about it; however, you find out I am poor and hungry. Is this okay? Would you consider ethical factors before labeling me a thief? What if I used the word starving instead of hungry? This is why the bigger picture components of ETHICS are so important! If you were a computer, you’d have done a binary classification of 1s and 0s to judge my actions – and ethically I’d hope you consider my need for broccoli soup. 

Classification of human phenomena under labels must be approached with discretion to prevent harm. Let’s look at content moderation as an example. Women posting on a Facebook page dedicated to new mothers had their posts/pictures of them breastfeeding heavily censored[1]. In response, the women were furious because this was supposed to be a space to destigmatize the struggles women face during breastfeeding; Instead, the Facebook content moderation model saw it as inappropriate and removed it, harming the community in the process. Classifying acceptable vs unacceptable behavior by face value is harmful because similar to this example, it ignores context. The comprehensiveness of ETHICS applied to various models would do well to prevent events like this from happening! Though, this application has yet to develop.

To the Future!

While by this point I hope to have convinced you of the exciting ethical implications behind ETHICS, it is worth explicitly mentioning that the dataset has yet to be used for any real tasks. A limitation that its creators acknowledge is the data sets lack of diversity, culturally and professionally, which helped create it. I would like to propose a suggestion. It is imperative that ethical codes be co-produced with the people they are influencing, so instead of trying to find English speakers in other countries to contribute to the ETHICS dataset, it would even the playing field if scenario contributions were accepted in different native languages! In building this sociotechnical imaginary, we want to begin on the right foot from the start and this includes preventing exclusion from the get-go. 

Once this is solved, I can see ETHICS being applied to exposing misaligned values in various algorithms, i.e. the racial bias in the COMPAS algorithm which unjustly calculated higher recidivism rates in Black people vs white. Further, it is imperative that once ETHICS is taken to the next level, its results are not taken lightly. There is a common trend of companies investing money into making AI more ethical, then ignoring the results or even firing the expose' (cough cough Google)[2]

Now that you understand AI alignment and the exciting step ETHICS is taking toward aligning human values to artificial intelligence, I hope you feel more hopeful about the development of AI and are more confident engaging in discussions towards AI alignment!

  1. ^

    T. Gillespie, Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions that Shape Social Media (New Haven: Yale University Press, 2018) Ch. 6, “Facebook, Breastfeeding, and Living in Suspension,” pp. 141-172

  2. ^

    “Holding to Account: Safiya Umoja Noble and Meredith Whittaker on Duties of Care and Resistance to Big Tech.” Logic Magazine, 1 Feb. 2022, logicmag.io/beacons/holding-to-account-safiya-umoja-noble-and-meredith-whittaker/. https://logicmag.io/beacons/holding-to-account-safiya-umoja-noble-and-meredith-whittaker/





More posts like this

Sorted by Click to highlight new comments since:

Had to go digging into the paper to find a link, so I figured I'd add it to the comments: https://github.com/hendrycks/ethics

Do you think this is a useful tool for AGI alignment? I can certainly see it being potentially useful for current models and a useful research tool, but I'm not sure if it is expected to scale.  It'd still be useful either way, but I'm curious about the scope and limitations of the dataset.

Curated and popular this week
Relevant opportunities