Hide table of contents

tl;dr: Ask questions about AGI Safety as comments on this post, including ones you might otherwise worry seem dumb!

Asking beginner-level questions can be intimidating, but everyone starts out not knowing anything. If we want more people in the world who understand AGI safety, we need a place where it's accepted and encouraged to ask about the basics.

We'll be putting up monthly FAQ posts as a safe space for people to ask all the possibly-dumb questions that may have been bothering them about the whole AGI Safety discussion, but which until now they didn't feel able to ask.

It's okay to ask uninformed questions, and not worry about having done a careful search before asking.

AISafety.info - Interactive FAQ

Additionally, this will serve as a way to spread the project Rob Miles' team[1] has been working on: Stampy and his professional-looking face aisafety.info. This will provide a single point of access into AI Safety, in the form of a comprehensive interactive FAQ with lots of links to the ecosystem. We'll be using questions and answers from this thread for Stampy (under these copyright rules), so please only post if you're okay with that!

Stampy - Here to help everyone learn about stamp maximization AGI Safety!

You can help by adding questions (type your question and click "I'm asking something else") or by editing questions and answers. We welcome feedback and questions on the UI/UX, policies, etc. around Stampy, as well as pull requests to his codebase and volunteer developers to help with the conversational agent and front end that we're building.

We've got more to write before he's ready for prime time, but we think Stampy can become an excellent resource for everyone from skeptical newcomers, through people who want to learn more, right up to people who are convinced and want to know how they can best help with their skillsets.

Guidelines for Questioners:

  • No previous knowledge of AGI safety is required. If you want to watch a few of the Rob Miles videos, read either the WaitButWhy posts, or the The Most Important Century summary from OpenPhil's co-CEO first that's great, but it's not a prerequisite to ask a question.
  • Similarly, you do not need to try to find the answer yourself before asking a question (but if you want to test Stampy's in-browser tensorflow semantic search that might get you an answer quicker!).
  • Also feel free to ask questions that you're pretty sure you know the answer to, but where you'd like to hear how others would answer the question.
  • One question per comment if possible (though if you have a set of closely related questions that you want to ask all together that's ok).
  • If you have your own response to your own question, put that response as a reply to your original question rather than including it in the question itself.
  • Remember, if something is confusing to you, then it's probably confusing to other people as well. If you ask a question and someone gives a good response, then you are likely doing lots of other people a favor!
  • In case you're not comfortable posting a question under your own name, you can use this form to send a question anonymously and I'll post it as a comment.

Guidelines for Answerers:

  • Linking to the relevant answer on Stampy is a great way to help people with minimal effort! Improving that answer means that everyone going forward will have a better experience!
  • This is a safe space for people to ask stupid questions, so be kind!
  • If this post works as intended then it will produce many answers for Stampy's FAQ. It may be worth keeping this in mind as you write your answer. For example, in some cases it might be worth giving a slightly longer / more expansive / more detailed explanation rather than just giving a short response to the specific question asked, in order to address other similar-but-not-precisely-the-same questions that other people might have.

Finally: Please think very carefully before downvoting any questions, remember this is the place to ask stupid questions!

  1. ^

    If you'd like to join, head over to Rob's Discord and introduce yourself!

Comments11


Sorted by Click to highlight new comments since:

What are the arguments for why someone should work in AI safety over wild animal welfare? (Holding constant personal fit etc)

  • If someone thinks wild animals live positive lives, is it reasonable to think that AI doom would mean human extinction but maintain ecosystems? Or does AI doom threaten animals as well?
  • Does anyone have BOTECs on numbers of wild animals vs numbers of digital minds?

I would like to know about the history of the term "AI alignment". I found an article written by Paul Christiano in 2018. Did the use of the term start around this time? Also, what is the difference between AI alignment and value alignment?

https://www.alignmentforum.org/posts/ZeE7EKHTFMBs8eMxn/clarifying-ai-alignment

Some considerations I came to think about which might prevent AI systems from becoming power-seeking by default: 

  • Seeking power implies a time delay on the thing it's actually trying to do, which could be against its preferences for various reasons.
  • The longer the time-frame, the more complexity and uncertainty will be added, like "how to gain power", "will this help further the actual goal" etc.

So even if AI systems make plans / chose actions based on expected value calculations, just doing the thing they are trying to do might be the better strategy. (Even if gaining more power first would, if it worked, eventually make the AI system better achieve its goal).

Am I missing something? And are there any predictions on which of these two trends will win out? (I'm speaking of cases where we did not intend the system to be power-seeking, as opposed to, e.g., when you program the system to "make as much money as possible, forever".)

What are the key cruxes between people who think AGI is about to kill us all, and those who don't? I'm at the stage where I can read something like this and think "ok so we're all going to die", then follow it up with this and be like "ah great we're all fine then". I don't yet have the expertise to critically evaluate the arguments in any depth. Has anyone written something that explains where people begin to diverge, and why, in a reasonably accessible way?

Do the concepts behind AGI safety only make sense if you have roughly the same worldview as the top AGI safety researchers - secular atheism and reductive materialism/physicalism and a computational theory of mind?

Can you highlight some specific AGI safety concepts that make less sense without secular atheism, reductive materialism, and/or computational theory of mind?

I'd like to underline that I'm agnostic, and I don't know what the true nature of our reality is, though lately I've been more open to anti-physicalist views of the universe.

For one, if there's a continuation of consciousness after death then AGI killing lots of people might not be as bad as when there is no continuation of consciousness after death. I would still consider it very bad, but mostly because I like this world and the living beings in it and would not like them to end, but it wouldn't be the end of consciousnesses like some doomy AGI safety people imply.

Another thing is that the relationship between consciousness and the physical universe might be more complex than physicalists say - like some of the early figures of quantum physics thought - and there might unknown to current science factors at play that could have an effect on the outcome. I don't have more to say about this because I'm uncertain what the relationship between consciousness and the physical universe might be in such a view.

And lastly, if there's God or gods or something similar, such beings would have agency and could have an effect on what the outcome might be. For example, there are Christian eschatological views that say that the Christian prophecies about the New Earth and other such things must come true in some way, so the future cannot end in a total extinction of all human life.

Suppose someone is an ethical realist: the One True Morality is out there, somewhere, for us to discover. Is it likely that AGI will be able to reason its way to finding it? 

What are the best examples of AI behavior we have seen where a model does something "unreasonable" to further its goals? Hallucinating citations?

I've been doing a 1-year CS MSc (one of the 'conversion' courses in the UK). I took as many AI/ML electives as I'm permitted to/can handle, but I missed out on an intro to RL course. I'm planning to take some time to (semi-independently) up-skill in AI safety after graduating. This might involve some projects and some self-study.

It seems like a good idea to be somewhat knowledgeable on RL basics going forward. I've taken (paid) accredited, distance/online courses (with exams etc.) concurrently with my main degree and found them to be higher quality than common perception suggests - although it does feel slightly distracting to have more on my plate.

Is it worth doing a distance/online course in RL (e.g. https://online.stanford.edu/courses/xcs234-reinforcement-learning ) as one part of the up-skilling period following graduation? Besides the Stanford online one that I've linked, are there any others that might be high quality and worth looking into? Otherwise, are there other resources that might be good alternatives?

Curated and popular this week
 ·  · 5m read
 · 
[Cross-posted from my Substack here] If you spend time with people trying to change the world, you’ll come to an interesting conundrum: Various advocacy groups reference previous successful social movements as to why their chosen strategy is the most important one. Yet, these groups often follow wildly different strategies from each other to achieve social change. So, which one of them is right? The answer is all of them and none of them. This is because many people use research and historical movements to justify their pre-existing beliefs about how social change happens. Simply, you can find a case study to fit most plausible theories of how social change happens. For example, the groups might say: * Repeated nonviolent disruption is the key to social change, citing the Freedom Riders from the civil rights Movement or Act Up! from the gay rights movement. * Technological progress is what drives improvements in the human condition if you consider the development of the contraceptive pill funded by Katharine McCormick. * Organising and base-building is how change happens, as inspired by Ella Baker, the NAACP or Cesar Chavez from the United Workers Movement. * Insider advocacy is the real secret of social movements – look no further than how influential the Leadership Conference on Civil Rights was in passing the Civil Rights Acts of 1960 & 1964. * Democratic participation is the backbone of social change – just look at how Ireland lifted a ban on abortion via a Citizen’s Assembly. * And so on… To paint this picture, we can see this in action below: Source: Just Stop Oil which focuses on…civil resistance and disruption Source: The Civic Power Fund which focuses on… local organising What do we take away from all this? In my mind, a few key things: 1. Many different approaches have worked in changing the world so we should be humble and not assume we are doing The Most Important Thing 2. The case studies we focus on are likely confirmation bias, where
 ·  · 2m read
 · 
I speak to many entrepreneurial people trying to do a large amount of good by starting a nonprofit organisation. I think this is often an error for four main reasons. 1. Scalability 2. Capital counterfactuals 3. Standards 4. Learning potential 5. Earning to give potential These arguments are most applicable to starting high-growth organisations, such as startups.[1] Scalability There is a lot of capital available for startups, and established mechanisms exist to continue raising funds if the ROI appears high. It seems extremely difficult to operate a nonprofit with a budget of more than $30M per year (e.g., with approximately 150 people), but this is not particularly unusual for for-profit organisations. Capital Counterfactuals I generally believe that value-aligned funders are spending their money reasonably well, while for-profit investors are spending theirs extremely poorly (on altruistic grounds). If you can redirect that funding towards high-altruism value work, you could potentially create a much larger delta between your use of funding and the counterfactual of someone else receiving those funds. You also won’t be reliant on constantly convincing donors to give you money, once you’re generating revenue. Standards Nonprofits have significantly weaker feedback mechanisms compared to for-profits. They are often difficult to evaluate and lack a natural kill function. Few people are going to complain that you provided bad service when it didn’t cost them anything. Most nonprofits are not very ambitious, despite having large moral ambitions. It’s challenging to find talented people willing to accept a substantial pay cut to work with you. For-profits are considerably more likely to create something that people actually want. Learning Potential Most people should be trying to put themselves in a better position to do useful work later on. People often report learning a great deal from working at high-growth companies, building interesting connection
 ·  · 31m read
 · 
James Özden and Sam Glover at Social Change Lab wrote a literature review on protest outcomes[1] as part of a broader investigation[2] on protest effectiveness. The report covers multiple lines of evidence and addresses many relevant questions, but does not say much about the methodological quality of the research. So that's what I'm going to do today. I reviewed the evidence on protest outcomes, focusing only on the highest-quality research, to answer two questions: 1. Do protests work? 2. Are Social Change Lab's conclusions consistent with the highest-quality evidence? Here's what I found: Do protests work? Highly likely (credence: 90%) in certain contexts, although it's unclear how well the results generalize. [More] Are Social Change Lab's conclusions consistent with the highest-quality evidence? Yes—the report's core claims are well-supported, although it overstates the strength of some of the evidence. [More] Cross-posted from my website. Introduction This article serves two purposes: First, it analyzes the evidence on protest outcomes. Second, it critically reviews the Social Change Lab literature review. Social Change Lab is not the only group that has reviewed protest effectiveness. I was able to find four literature reviews: 1. Animal Charity Evaluators (2018), Protest Intervention Report. 2. Orazani et al. (2021), Social movement strategy (nonviolent vs. violent) and the garnering of third-party support: A meta-analysis. 3. Social Change Lab – Ozden & Glover (2022), Literature Review: Protest Outcomes. 4. Shuman et al. (2024), When Are Social Protests Effective? The Animal Charity Evaluators review did not include many studies, and did not cite any natural experiments (only one had been published as of 2018). Orazani et al. (2021)[3] is a nice meta-analysis—it finds that when you show people news articles about nonviolent protests, they are more likely to express support for the protesters' cause. But what people say in a lab setting mig