Hide table of contents

As 2022 comes to an end, I thought it'd be good to maintain a list of "questions that bother me" in thinking about AI safety and alignment. I don't claim I'm the first or only one to have thought about them. I'll keep updating this list.

(The title of this post alludes to the book "Things That Bother Me" by Galen Strawson)

First posted: 12/6/22

Last updated: 1/30/23

 

General Cognition

  •  What signs do I need to look for to tell whether a model's cognition has started to emerge, e.g., situational awareness?
  • Will a capacity for "doing science" be sufficient condition for general intelligence?
  • How easy was it for humans to get science (e.g., compared to evolving to take over the world). 

Deception 

  •  What kind of interpretability tools do we need to avoid deception? 
  •  How do we get these interpretability tools and even if we do get them, what if they're like neuroscience for understanding brains (not enough)?
  •  How can I tell whether a model has found another goal to optimize for during its training?
  •  What is it that makes a model switch to a goal different from the one set by the designer? How do you prevent it from doing so?

Agent Foundations 

  • Is the description/modeling of an agent ultimately a mathematical task?
  • From where do human agents derive their goals?
  • Is value fragile

Theory of Machine Learning

  • What explains the success of deep neural networks?
  • Why was connectionism unlikely to succeed? 

Epistemology of Alignment (I've written about this here)

  • How can we accelerate research?
  • Has philosophy ever really helped scientific research e.g., with concept clarification?
  • What are some concrete takeaways from the history of science and technology that could be used as advice for alignment researchers and field-builders? 
  • The emergence of the AI Safety paradigm

Philosophy of Existential Risk 

  • What is the best way to explain the difference between forecasting extinction scenarios and narratives from chiliasm or escatology? 
  • What is the best way to think about serious risks in the future without reinforcing a sense of doom? 

Teaching and Communication

  • Younger people (e.g., my undergraduate students) seem more willing to entertain scenarios of catastrophes and extinction compared to older people (e.g., academics). I find that strange and I don't have a good explanation as to why that is the case. 
  • The idea of a technological singularity was not difficult to explain and discuss with my students. I think that's surprising given how powerful the weirdness heuristic is.
  • The idea of "agency" or "being an agent" was easy to conflate with "consciousness" in philosophical discussions. It's not clear to me why that was the case since I gave a very specific definition of agency. 
  • Most of my students thought that AI models will never be conscious; it was difficult for them to articulate specific arguments about this, but their intuition seemed to be that there's something uniquely human about consciousness/sentience. 
  • The "AIs will take our jobs in the future" seems to be a very common concern both among students and academics. 
  • 80% of a ~25 people classroom thought that philosophy is the right thing to major in if you're interested in how minds work. The question I asked them was: "should you major in philosophy or cognitive science if you want to study how minds work?"

Governance/Strategy

  • Should we try to slow down AI progress? What does this mean in concrete steps? 
  • How should we go about capabilities externalities? 
  • How should concrete AI risk stories inform/affect AI governance and short-term/long-term future planning?
Comments7


Sorted by Click to highlight new comments since:

+1 to sharing lists of questions.

 What signs do I need to look for to tell whether a model's cognition has started to emerge?

I don't know what 'cognition emerging' means. I suspect the concept is vague/confused.

What is the best way to explain the difference between forecasting extinction scenarios and narratives from chiliasm or escatology? 

Why would you want to explain the difference?

I've been asked this question! Or, to be specific, I've been asked something along these lines: human cultures have always been speculating about the end of the world so how is this forecasting x-risk any different? 

[anonymous]3
2
0

Younger people (e.g., my undergraduate students) seem more willing to entertain scenarios of catastrophes and extinction compared to older people (e.g., academics). I find that strange and I don't have a good explanation as to why that is the case.

Some hypotheses to test
- Younger people are more likely to hold and signal radical beliefs  and the possibility of extinction is seen as more radical and exciting compared to humanity muddling through like it's done in the past
- Younger people are just beginning to grapple with their own mortality which freaks them out whereas older people are more likely to have made peace with it in some sense
- Older people have survived through many events (including often fairly traumatic ones) so are more likely to have a view of a world that "gets through things" as this aligns with their personal experience
- Older people have been around for a number of past catastrophic predictions that turned out to be wrong?

- Older people have survived through many events (including often fairly traumatic ones) so are more likely to have a view of a world that "gets through things" as this aligns with their personal experience
- Older people have been around for a number of past catastrophic predictions that turned out to be wrong?

Nuclear war has been in the news for more than 60 years, and a high priority has been placed on spending those >60 years influencing public opinion on nuclear war via extremely carefully worded statements by spokespeople, which in turn were ghostwritten by spin doctors and other psychological experts with a profoundly strong understanding of news media corporations. This is the main reason, and possibly the only reason, why neither of the two American policial parties or presidential candidates have ever adopted disarmament as part of their nationwide party platform during any elections in that time period.

They weren't successful at their goals 100% of the time (Soviet Propaganda operations may have contributed), but their efforts (and the fact that nuclear war scared people but never happened once for 60+ years) strongly affected the life experiences and cultural development of older people while they were younger.

I would suggest that new paradigms are most likely to establish themselves among the young because they are still in the part of their life where they are figuring out their views.

You should make manifold markets predicting what you’ll think of these questions in a year or 5 years.

[comment deleted]1
0
0
More from Eleni_A
51
Eleni_A
· · 1m read
Curated and popular this week
 ·  · 2m read
 · 
I can’t recall the last time I read a book in one sitting, but that’s what happened with Moral Ambition by bestselling author Rutger Bregman. I read the German edition, though it’s also available in Dutch (see James Herbert's Quick Take). An English release is slated for May. The book opens with the statement: “The greatest waste of our times is the waste of talent.” From there, Bregman builds a compelling case for privileged individuals to leave their “bullshit jobs” and tackle the world’s most pressing challenges. He weaves together narratives spanning historical movements like abolitionism, suffrage, and civil rights through to contemporary initiatives such as Against Malaria Foundation, Charity Entrepreneurship, LEEP, and the Shrimp Welfare Project. If you’ve been engaged with EA ideas, much of this will sound familiar, but I initially didn’t expect to enjoy the book as much as I did. However, Bregman’s skill as a storyteller and his knack for balancing theory and narrative make Moral Ambition a fascinating read. He reframes EA concepts in a more accessible way, such as replacing “counterfactuals” with the sports acronym “VORP” (Value Over Replacement Player). His use of stories and examples, paired with over 500 footnotes for details, makes the book approachable without sacrificing depth. I had some initial reservations. The book draws heavily on examples from the EA community but rarely engages directly with the movement, mentioning EA mainly in the context of FTX. The final chapter also promotes Bregman’s own initiative, The School for Moral Ambition. However, the school’s values closely align with core EA principles. The ITN framework and pitches for major EA cause areas are in the book, albeit with varying levels of depth. Having finished the book, I can appreciate its approach. Moral Ambition feels like a more pragmatic, less theory-heavy version of EA. The School for Moral Ambition has attracted better-known figures in Germany, such as the political e
MarieF🔸
 ·  · 4m read
 · 
Summary * After >2 years at Hi-Med, I have decided to step down from my role. * This allows me to complete my medical residency for long-term career resilience, whilst still allowing part-time flexibility for direct charity work. It also allows me to donate more again. * Hi-Med is now looking to appoint its next Executive Director; the application deadline is 26 January 2025. * I will join Hi-Med’s governing board once we have appointed the next Executive Director. Before the role When I graduated from medical school in 2017, I had already started to give 10% of my income to effective charities, but I was unsure as to how I could best use my medical degree to make this world a better place. After dipping my toe into nonprofit fundraising (with Doctors Without Borders) and working in a medical career-related start-up to upskill, a talk given by Dixon Chibanda at EAG London 2018 deeply inspired me. I formed a rough plan to later found an organisation that would teach Post-traumatic stress disorder (PTSD)-specific psychotherapeutic techniques to lay people to make evidence-based treatment of PTSD scalable. I started my medical residency in psychosomatic medicine in 2019, working for a specialised clinic for PTSD treatment until 2021, then rotated to child and adolescent psychiatry for a year and was half a year into the continuation of my specialisation training at a third hospital, when Akhil Bansal, whom I met at a recent EAG in London, reached out and encouraged me to apply for the ED position at Hi-Med - an organisation that I knew through my participation in their introductory fellowship (an academic paper about the outcomes of this first cohort can be found here). I seized the opportunity, applied, was offered the position, and started working full-time in November 2022.  During the role I feel truly privileged to have had the opportunity to lead High Impact Medicine for the past two years. My learning curve was steep - there were so many new things to
Sarah Cheng
 ·  · 2m read
 · 
TL;DR: The EA Opportunity Board is back up and running! Check it out here, and subscribe to the bi-weekly newsletter here. It’s now owned by the CEA Online Team. EA Opportunities is a project aimed at helping people find part-time and volunteer opportunities to build skills or contribute to impactful work. Their core products are the Opportunity Board and the associated bi-weekly newsletter, plus related promos across social media and Slack automations. It was started and run by students and young professionals for a long time, and has had multiple iterations over the years. The project has been on pause for most of 2024 and the student who was running it no longer has capacity, so the CEA Online Team is taking it over to ensure that it continues to operate. I want to say a huge thank you to everyone who has run this project over the three years that it’s been operating, including Sabrina C, Emma W, @michel, @Jacob Graber, and Varun. From talking with some of them and reading through their docs, I can tell that it means a lot to them, and they have some grand visions for how the project could grow in the future. I’m happy that we are in a position to take on this project on short notice and keep it afloat, and I’m excited for either our team or someone else to push it further in the future. Our plans We plan to spend some time evaluating the project in early 2025. We have some evidence that it has helped people find impactful opportunities and stay motivated to do good, but we do not yet have a clear sense of the cost-effectiveness of running it[1]. We are optimistic enough about it that we will at least keep it running through the end of 2025, but we are not currently committing to owning it in the longer term. The Online Team runs various other projects, such as this Forum, the EA Newsletter, and effectivealtruism.org. I think the likeliest outcome is for us to prioritize our current projects (which all reach a larger audience) over EA Opportunities, which