Hide table of contents

Supported by Rethink Priorities

This is part of a weekly series - you can see the full collection here. The first post includes some details on purpose and methodology.

If you'd like to receive these summaries via email, you can subscribe here.

Podcast version: prefer your summaries in podcast form? A big thanks to Coleman Snell for producing these! Subscribe on your favorite podcast app by searching for 'Effective Altruism Forum Podcast'.

 

Top / Curated Readings

Designed for those without the time to read all the summaries. Everything here is also within the relevant sections later on so feel free to skip if you’re planning to read it all.
 

Lessons learned from talking to >100 academics about AI safety

by mariushobbhahn

The author talked to 100-200 people in academia about AI safety, from bachelor’s students to senior faculty, over several years. This post summarizes the key learnings.

Author’s tl;dr (lightly edited): “Academics are increasingly open to arguments about AI risk and I’d recommend having lots of these chats. I underestimated how much work related to aspects of AI safety (eg. interpretability) already exists in academia - we sometimes reinvent the wheel. Messaging matters, e.g. technical discussions got more interest than alarmism and explaining the problem rather than trying to actively convince someone received better feedback.”

 

Measuring Good Better

by MichaelPlant, GiveWell, Jason Schukraft, Matt_Lerner, Innovations for Poverty Action

Transcripts of 5-minute lightning talks by various orgs on their approach to measuring ‘good’. A very short summary of each is below:

Givewell uses moral weights to compare different units (eg. doubling incomes vs. saving an under-5s life). These are 60% based on donor surveys, 30% from a 2019 survey of 2K people in Kenya and Ghana, and 10% staff opinion.

Open Philanthropy’s global health and wellbeing team uses the unit of ‘a single dollar to someone making 50K per year’ and then compares everything to that. Eg. Averting a DALY is worth 100K of these units.

Happier Lives Institute focuses on wellbeing, measuring WELLBYs. One WELLBY is a one-point increase on a 0-10 life satisfaction scale for one year.

Founder’s pledge values cash at $199 per WELLBY. They have conversion rates from WELLBYs to Income Doublings to Deaths Avoided to DALYs Avoided, using work from some of the orgs above. This means they can get a dollar figure they’re willing to spend for each of these measures.

Innovations for Poverty Action asks different questions depending on the project stage (eg. idea, pilot, measuring, scaling). Early questions can be eg. if it’s the right solution for the audience, and only down the line can you ask ‘does it actually save more lives?’


 

Metaculus Launches the 'Forecasting Our World In Data' Project to Probe the Long-Term Future

by christian, EdMathieu

Forecasting Our World In Data is a tournament that will deliver predictions on technological advancement, global development, and social progress using Our World in Data metrics. 20K prize pool for accurate forecasts on 1-3 year time horizons, and cogent analysis on 10-100 years horizons. The first questions have opened, with more to come on 19th and 26th Oct.


 

EA Forum

Philosophy and Methodologies

We can do better than argmax

by Jan_Kulveit, Gavin

Author’s tl;dr (lightly edited): a common prioritization method in EA is putting all resources on your top option (argmax). But this can be foolish, so we deviate in ad-hoc ways. We describe a principled softmax approach, allocating resources to several options by confidence. This works well when a whole community collaborates on impact; when some opportunities are fleeting or initially unknown; or when large actors are in play.


 

Parfit + Singer + Aliens = ?

by Maxwell Tabarrok

If we observe even simple (eg. single cellular) alien life, the chance of intelligent and morally relevant alien life existing somewhere increases drastically. In this case, human extinction isn’t as bad - the difference between eg. 95% and 100% of humans dead becomes much less. This makes risky moves like advancing AI or biotech (which could either destroy us or be hugely positive) more positive on balance, and implies we should upweight higher volatility paths.


 

Measuring Good Better

by MichaelPlant, GiveWell, Jason Schukraft, Matt_Lerner, Innovations for Poverty Action

Transcripts of 5-minute lightning talks by various orgs on their approach to measuring ‘good’. A very short summary of each is below:

Givewell uses moral weights to compare different units (eg. doubling incomes vs. saving an under-5s life). These are 60% based on donor surveys, 30% from a 2019 survey of 2K people in Kenya and Ghana, and 10% staff opinion.

Open Philanthropy’s global health and wellbeing team uses the unit of ‘a single dollar to someone making 50K per year’ and then compares everything to that. Eg. Averting a DALY is worth 100K of these units.

Happier Lives Institute focuses on wellbeing, measuring WELLBYs. One WELLBY is a one-point increase on a 0-10 life satisfaction scale for one year.

Founder’s pledge values cash at $199 per WELLBY. They have conversion rates from WELLBYs to Income Doublings to Deaths Avoided to DALYs Avoided, using work from some of the orgs above. This means they can get a dollar figure they’re willing to spend for each of these measures.

Innovations for Poverty Action asks different questions depending on the project stage (eg. idea, pilot, measuring, scaling). Early questions can be eg. if it’s the right solution for the audience, and only down the line can you ask ‘does it actually save more lives?’


 

Object Level Interventions / Reviews

Introducing the EA Good Governance Project

by Grayden

Author’s tl;dr: “I believe good governance is important and often underrated within EA. I'm launching the EA Good Governance Project.  Its first initiative will be a directory of EA Board candidates.  If you have skills and experience to offer to an EA Board, please add your profile.”

They also plan to add practical resources for Boards eg. how to measure impact and set appropriate policies, and are looking for contributors to this.


 

Why I think there's a one-in-six chance of an imminent global nuclear war

by Tegmark

The author predicts there is a 30% chance of Russia launching nukes, in that case 80% chance that NATO responds with conventional weapons, and in that case 70% chance of a global nuclear war. This equates to a ⅙ chance of global nuclear war from today’s state.

They argue that Putin will not accept a full loss without going nuclear, because he’d likely be jailed / killed. The other alternative, de-escalation, seems disfavored in the West because Ukraine is winning. And if a nuke is used in Ukraine, escalation to eventual nuclear war seems likely because the countries involved have a long history of nuclear near misses, and have made retaliation threats already.

These predictions are significantly more pessimistic than the community average. Metaculus currently gives the first stage (nuclear use in Ukraine) 7% odds this year, and the Samotsvetsy gives 16% for the next year, and 1.6% of nuclear use beyond that in the next year.


 

Lessons learned from talking to >100 academics about AI safety

by mariushobbhahn

The author talked to 100-200 people in academia about AI safety, from bachelor’s students to senior faculty, over several years. This post summarizes the key learnings.

Author’s tl;dr (lightly edited): “Academics are increasingly open to arguments about AI risk and I’d recommend having lots of these chats. I underestimated how much work related to aspects of AI safety (eg. interpretability) already exists in academia - we sometimes reinvent the wheel. Messaging matters, e.g. technical discussions got more interest than alarmism and explaining the problem rather than trying to actively convince someone received better feedback.”


 

Anonymous advice: If you want to reduce AI risk, should you take roles that advance AI capabilities?

by Benjamin Hilton, 80000_Hours

Work which increases AI capabilities is intertwined with certain types of safety work, and can be a good way to skill-build for safety work. However, it can also be harmful by accelerating dangerous AI. 80K anonymously asked 22 experts for their views on this balance, and received 10 responses published in full in this post.

How many experts leant each way and the common arguments for those leanings are summarized below.

Mostly Yes (2)

  • There’s lots of overlap between alignment & capabilities work
  • Capabilities work is heavily funded and incentivized - on the margin one researcher isn’t going to do much
  • Influencing the orgs from within is important
  • Building career capital is important

It Depends / 'Yes if careful' (4)

  • Some types of capabilities research are more dangerous than others (eg. training efficiency, world model understanding) - avoid these
  • Other types overlap well with safety, or are unlikely to accelerate AGI - go for these
  • Value drift is common, stay in touch with the alignment community

Mostly No (3)

  • There’s a limited amount of capabilities research until we reach AGI, and that’s our timeline for alignment - don’t use it up unless really worth it
  • If big alignment payoff, or you wouldn't be at the edge of capabilities research and any capabilities work is kept private then it could be worth it

Strong No (1)

  • Top researchers can contribute a lot - don’t risk advancing capabilities significantly
  • Alignment research has capabilities implications, the reverse is rarely true


 

Sheltering humanity against x-risk: report from the SHELTER weekend

by Janne M. Korhonen

Write-up of learnings from a participant in the August SHELTER weekend, an event for gaining clarity on what’s needed to build civilizational shelters.

Key takeaways included:

  • Reliable shelters against x-risk are probably infeasible, and have some downsides (eg. incentivizing hazards, rich vs. poor dynamics). 
     
  • Increasing societal resilience and capability for cooperative action is a better approach. This could be tackled via improving existing disaster management systems, hardening risky facilities such as bio labs, and supporting isolated communities (with their buy-in).
     
  • Both shelters and general resilience efforts are primarily not a technological problem. An exception would be developing self-sustaining ecosystems for space colonization (a long-term bet).
     
  • Risks without societal breakdown involved are unlikely to be x-risks (other than those that wouldn’t benefit from shelters, such as AI). Dangerous pathogens were broadly agreed as the highest risk.

     

Responding to recent critiques of iron fortification in India

by e19brendan

Fortify Health co-founder Brendan responds to forum posts which suggest that recent studies on the prevalence of anemia in India and proportion attributable to iron-deficiency should lower cost-effectiveness estimates of fortification. The posts also suggested using more targeted treatment and changing anemia cut-offs.

Brendan reviewed the provided studies, and found that:

  • Anemia attributable to iron deficiency is an important measure, but the studies seem comparable to Givewell estimates.
  • Targeted treatment has promise and is likely worth doing (though less scalable).
  • Cutoffs for ‘healthy’ iron levels is a tricky topic - even those without diagnosed conditions might be healthier with more iron.


 

Counterarguments to the basic AI risk case

by Katja_Grace

Counters to the argument that goal-directed AIs are likely and it’s hard to align them to good goals, therefore there’s significant x-risk:

  • AIs may optimize more for ‘looking like they’re pursuing X goal’ than actually pursuing it. This would mean they wouldn’t go after instrumental goals like money or power. The jury is out on if ML models lean this way or not.
     
  • Even if an AI’s values / goals don’t match ours, they could be close enough, or be non-destructive. Or they could have short time horizons that don’t make worldwide takeovers worth it.
     
  • We might be more powerful than a superintelligent AI. Collaboration was as or more important than intelligence for humans becoming the dominant species, and we could have non-agentic AIs on our side. AIs might also hit ceilings in intelligence, or be working on tasks that don’t scale much with intelligence.
     
  • The core AI x-risk argument could apply to corporations too - but we don’t consider them x-risks. Corporations are goal-directed, hard to align precisely, far more powerful than individual humans, and adapt over time - but aren’t considered x-risks.

 

 

The US expands restrictions on AI exports to China. What are the x-risk effects?

by Stephen Clare

Last week the Biden administration announced regulations that make it illegal for US companies to export certain AI-related products and services to China, including high-end chips and semiconductor equipment.

The author questions on the impact on China’s AI trajectory, if this will increase the likelihood of conflict, and how these rising tensions might affect cooperation on other global risks.


 

Opportunities

SERI MATS Program - Winter 2022 Cohort

by Ryan Kidd

Applications open until Oct 24th for the MATS program, which supports aspiring alignment researchers to do independent research with funding (via LTFF), mentorship and training. The winter cohort will run Nov 7 - Feb 23rd, and be in person in Berkeley from Jan 3rd.


 

Book a corporate event for Giving Season

by Jack Lewars, Luke Freeman, Federico Speziali

Author’s tl;dr: “following the success of previous corporate talks, One for the World, Giving What We Can and High Impact Professionals are collaborating to offer a range of corporate talks this giving season. Use our contact form to learn more or book a talk.” GWWC is also offering workshops for a more interactive experience.


 

Alignment 201 curriculum

by richard_ngo

A follow-up to the Alignment Fundamentals curriculum, this 9-week curriculum aims to give enough knowledge to understand the frontier of current research discussions. It’s targeted at those who have taken the previous course, in addition to having some knowledge of deep learning and reinforcement learning.


 

Metaculus Launches the 'Forecasting Our World In Data' Project to Probe the Long-Term Future

by christian, EdMathieu

Forecasting Our World In Data is a tournament that will deliver predictions on technological advancement, global development, and social progress using Our World in Data metrics. 20K prize pool for accurate forecasts on 1-3 year time horizons, and cogent analysis on 10-100 years horizons. The first questions have opened, with more to come on 19th and 26th Oct.


 

Growth Theory Reading List

by LuisMota

List of readings on economic growth theory, broken down into 10 sub-topics such as long run historical growth, AI and growth, stagnation, growth and happiness.


 

EAGxVirtual: A virtual venue, timings, and other updates

by Alex Berezhnoi

Applications due before 19th October (conference from 21st October). >600 applicants so far from >60 countries. The post highlights content to expect, platforms used, and puts out a call for volunteers.


 

Community & Media

Why defensive writing is bad for community epistemics

by Emrik

Defensive writing is optimizing your writing for making sure no-one has a bad impression of you. This can become a norm when readers try to make inferences about the author vs. just learning from the content (‘judgemental reading’). Both make communication inefficient and writing scary.

The author suggests being clear as a writer about the purpose of your writing, and if it’s helping your readers. As a reader, he suggests interpreting things charitably, rewarding confidence, and not punishing people for what they don’t know.


 

When reporting AI timelines, be clear who you're (not) deferring to

by Sam Clarke

It’s common to ask people’s AI timelines, and also common for responses not to include whether they’re independent impressions or based on other’s views. This can lead to these timelines feeling more robust than they are, and to groups of EAs converging on the same timelines without good reason.

The author suggests if you haven’t formed an independent impression, always say who you’re deferring to. If you’re asking about someone’s timelines, always ask how they got to them. They’ve also put up a survey to work out who people are deferring to most.


 

On absurdity

by OllieBase

What we’re doing is absurdly ambitious. Looking at things through the absurdity lens can help us step back, get energy, and be kinder to ourselves and others (particularly when we fail). For instance, realizing ‘trying to work out the world’s biggest problem with my two college friends’ or ‘running for office with no political background to single-handedly influence the senate on global health security’ are absurd takes off some of the pressure - while still remembering it’s worth a shot!


 

Some Carl Sagan quotations

by finm

Carl Sagan (1934 - 1996) was an astronomer and science communicator who captured many ideas related to longtermism and existential risk poetically. This article is a collection of some of the author’s favorite quotes from him.


 

Counterproductive EA mental health advice (and what to say instead)

by Ada-Maaria Hyvärinen

Some well-meaning advice is counter-productive. This includes telling people:

  • Happiness is important to productivity (making happiness an instrumental goal tends to make people less happy)
  • They’ve donated enough to save a life / offset damage, so they clearly deserve to live (associates their worth as a person with what they can contribute)
  • Take a break from EA and come back when you feel better (makes the right to a break feel conditional on coming back, so it’s not a true break)

Instead, say things in a way that reflects you care about that person for their intrinsic value, not just the impact they can have. This can be important in self-talk too.

 


Cultural EA considerations for Nordic folks

by Ada-Maaria Hyvärinen

Cultural information about EA that contrasts with the norm from a Finnish / Nordic perspective.

Some key topics:

  • Where EAs are concentrated and typical cultural differences there (eg. going to uni young, flatting due to high cost of living, willingness to move)
     
  • Career considerations (eg. few EA employers hire in Finland, even remote ones may have timezone requirements or lack benefits, you don’t need to have the ‘right’ degree to apply)
     
  • Differences in interaction style (eg. EAs often use ‘hype’ language or target things toward the ‘best’, lots of jargon, and may be more assertive than average)


 

Changes to EA Giving Tuesday for 2022

by Giving What We Can, mjamer, GraceAdams, Jack Lewars

Giving What We Can and One For The World volunteered to manage EA Giving Tuesday for 2022, somewhat scaled back (~25% the charities as previously, and minimal testing / revision of donation strategy). If you’d like to participate, sign up for email updates here.


 

An EA's Guide to Berkeley and the Bay Area

by Elika, Vaidehi Agarwalla

Guide to newcomers, positives and negatives. Most helpful if you’re already planning or seriously considering coming to Berkeley and want to get more context on the community and culture.


 

Pineapple Operations is expanding to include all operations talent (Oct '22 Update)

by Vaidehi Agarwalla, Alexandra Malikova

Pineapple Operations database of candidates now includes all Ops talent, not just PAs/ExAs. Links to list yourself or search the 100+ candidates.

 

Ask Charity Entrepreneurship Anything 

by Ula, KarolinaSarek, Joey

Some top comments at time of summarizing:

  • CE believes there are limits to desk research, and time-caps research so it can get to pilots and better data faster.
  • Entrepreneurs starting separate organisations (vs. sitting as part of CE) gives them more flexibility to take on risk, lets them move fast, and often suits the applicants who want ownership of their project.
  • Many people underrate their chances of being accepted - eg. because they’re new to EA, don’t have experience or domain expertise, or think they’ll be excluded on the basis of age or location. None of these factors should stop someone applying, the best way to test fit is to apply, and the best way to build experience and expertise is via the program.

     

Didn’t Summarize

Let me blind myself to Forum post authors by Will Payne (forum feature request)

Scout Mindset Poster by Anthony Fleming (printable poster)

 

 

LW Forum

Possible miracles

by Akash, Thomas Larsen

Eliezer’s List of Lethalities is a list of ways we could fail with regards to AGI. This post is a brainstorm of ways we might win - intended as an exercise for others to try too.

  1. New agendas might emerge - the field is growing rapidly in people, resources, respect, and existing work to build off. They could develop the idea we need.
     
  2. Alignment might be easy - easier to align methods than deep learning could emerge, or we could get slow takeoff, deception might not be selected for, tools we have like adversarial training might be enough, AGI might try to figure out our values (and succeed) etc.
     
  3. Timelines might be long - coordination and respect for x-risk might slow capabilities in favor of safety research, moore’s law might kick in, or deep learning might not scale all the way to AGI.
     
  4. We might solve whole brain emulation soon, and upload alignment researchers to do 1000x faster research.

The author suggests it could be helpful to backchain from these brainstorms to come up with new project ideas.


 

QAPR 4: Inductive biases

by Quintin Pope

A roundup of 16 alignment papers focused on the inductive biases of stochastic gradient descent. Links, quotes, and the author’s opinion are provided for each.


 

Niceness is unnatural

by So8res

There’s an argument that it might be easy to make AIs ‘nice’, because prosocial behavior is advantageous in multi-agent settings.

The author argues this is unlikely, because:
1. The role of niceness is selection pressures can be replaced with other strategies like ‘merge with local potential allies immediately’

2. Humans ‘niceness’ is detailed eg. We only do it sometimes, have limited patience, and differing sensitivity to various types of cheating. An AI might have a different set of details no longer recognizable as ‘niceness’.

3. Related skills like empathy might occur because our self-models are the same as our other-models (we’re both human). This doesn’t apply for AIs.

4. The AI might display nice behaviors while they’re useful, and then reflect and drop them when they’re not.


 

Help out Redwood Research’s interpretability team by finding heuristics implemented by GPT-2 small

by Haoxing Du, Buck

Some of Redwood’s research involves interpretability on specific behaviors language models exhibit. They’re considering scaling up that line of research, so asking commenters for more behaviors to investigate! They’ve put up a web app for people to use GPT-2 in order to identify behaviors.

An example is acronyms - GPT-2 Small consistently follows the heuristic “string together the first letter of each capitalized word, and then close the parentheses” when asked to generate acronyms.


 

Consider your appetite for disagreements

by Adam Zerner

4 illustrated examples of people disagreeing on a point because of minor differences. For instance, arguing whether a poker hand should have been folded, when both people believe it was a marginal call either way. Or arguing if a basketball player is the third or fifth best, when you both agree they’re top ten.

The author advocates that you have a small ‘appetite’ for these sorts of marginal disagreements, and spend little time on them. They also recommend communicating you have a large appetite for important and substantial disagreements.


 

The Balto/Togo theory of scientific development

by Elizabeth

In 1925, a relay of 150 dogs and 20 humans ran antibody serum to Alaska to end a diphtheria outbreak. The dog on the final and easiest leg was Balto, who became famous for it. The dog who ran the longest / hardest was Togo, who got comparatively little media.

A similar dynamic happens in science. Alfred Wegener is credited for discovering continental drift, but did no data collection, and little synthesis of evidence (the idea already existed in some papers). But people remember him and he inspired others to research further by advocating for an unproven idea. The author wonders how important this popularizing function is in general.


 

Calibration of a thousand predictions

by KatjaGrace

The author has made predictions in a spreadsheet for 4 years - as of now, ~1K are resolved. They created a calibration curve for ~630 predictions not about their own behavior, with 11 buckets, and found an average miscalibration error of only 3%. (These were primarily everyday life predictions such as if they'll be paid by x date, or invited to a certain party.) The accuracy was surprising because their internal experience of the predictions was ‘pulling a number out of thin air’. Accuracy for predicting their own behavior was much lower, particularly for the 35-55% range.



A common failure for foxes

by Rob Bensinger

In the parable of the Hedgehog and the Fox, the fox knows many things while the hedgehog knows one thing well. The author argues that people who see themselves as foxes often focus too much on RCTs over informal arguments, even when the RCT isn’t that relevant. This is because they want to feel like they ‘know things’ for sure, progress quickly in learning, and look intellectually modest (ie. ‘I’m just deferring to the data’).


 

Other

That one apocalyptic nuclear famine paper is bunkActually, All Nuclear Famine Papers are Bunk by Lao Mein

Some bloggers cite a study from Nature Food on why full US <-> Russia nuclear exchange might collapse civilization. The paper assumes a 10C drop in temperatures from nuclear winter will reduce farm yields 90%. However it also assumes no adaptation by humans - that we’ll keep the same crop selection and crop locations. Lao argues this is not realistic and makes the conclusions irrelevant.

There have also been claims by people such as Peter Zaihan that the world only has ~2 months' worth of food in reserve. Similarly, the paper above states an assumption that all food stores will be used up in the first year of attack. By examining US grain reserves, Lao finds there is enough to last the US ~half a decade, even without considering other food sources.


 

Transformative VR Is Likely Coming Soon

by jimrandomh

The author estimates 2.5 years until VR is better than in person for most meetings, given that Oculus announced a new headset last week which tackles many of the issues with previous VR meetings (eg. not being able to see the real world, hidden facial expressions, and audio latency). They expect the shift to be sudden and impactful - particularly on organizational structures and remote work.


 

Towards a comprehensive study of potential psychological causes of the ordinary range of variation of affective gender identity in males

by tailcalled

Someone who identifies as male might vary from being distressed by the idea of being a woman, to being neutral / wouldn’t mind, to being positive about it. The author studies this variation via surveys of cis men who don’t idenitfy as trans or gender questioning, and tries to correlate it to other factors such as gender conservativism or extraversion.

 

Didn’t Summarize

Six (and a half) intuitions for KL divergence by TheMcDouglas

Prettified AI Safety Game Cards by abramdemski

Contra shard theory, in the context of the diamond maximizer problem by So8res


This Week on Twitter

AI

2022’s State of AI report is live. Key trends (aggregated in this tweet) include:

  • Research collectives open-sourcing AI language, text-to-image, and protein models by large labs at incredible pace.
  • Language models being used to predict things in bio, such as high-risk Covid-19 variants (predicted as high-risk before WHO identified them).
  • LLMs being trained to use software tools like search engines and web apps.
  • Increasing focus on safety, with safety / alignment researchers at major AI labs up to ~300 (from ~100 last year).
  • [Heaps more details in report]
     

DeepMind released a paper about self-supervised training on video instead of image datasets (richer data). (tweet)

 

EA

The US Supreme Court heard arguments on whether or not to uphold Prop12 - a California law banning the sale of pork from pigs kept in spaces too small to turn around. Could have implications on the types of further laws that can be passed. Decision due in late June 2023. (tweet) (article) 

 

New paper addressing how natural risks might be higher because of civilization. Eg. Pandemics are risker because of travel, and space weather is riskier because it can affect technology such as power grids. (tweet) (paper)
 

National Security

New US national security strategy includes explicit commitments to strengthening the BWC (biological weapons convention) and the need for more focus on deliberate + accidental threat mitigation. (tweet)

Putin said that more missile strikes against Ukraine ‘not necessary’ and that the aim isn’t to destroy the country. (tweet)

Iran sending drones and missiles to Russia. The drones are already being used by Russia against Ukraine. (tweet) (article)

 

Science

New ‘our world in data’ section on which countries routinely administer vaccines. (tweet) (page)
 

Comments2
Sorted by Click to highlight new comments since: Today at 3:14 PM

Forgot to say, this is an excellent summary of my post. Basically I've severely underrated the value of this series, will definitely read more, and hope you keep it up! : )

That's great to hear, thank you :)