Hide table of contents

This is part of a weekly series summarizing the top posts on the EA and LW forums - you can see the full collection here. The first post includes some details on purpose and methodology. Feedback, thoughts, and corrections are welcomed.

If you'd like to receive these summaries via email, you can subscribe here.

Podcast version: Subscribe on your favorite podcast app by searching for 'EA Forum Podcast (Summaries)'. A big thanks to Coleman Snell for producing these!

Author's note: I'm currently travelling, which means:
a) Today's newsletter is a shorter one - only 9 top posts are covered, though in more depth than usual.
b) The next post will be on 17th April (three week gap), covering the prior three weeks at a higher karma bar.
After that, we'll be back to the regular schedule.

 

Object Level Interventions / Reviews

How much should governments pay to prevent catastrophes? Longtermism’s limited role

by EJT, CarlShulman

Linkpost for this paper, which uses standard cost-benefit analysis (CBA) with detrimental assumptions (eg. giving no value to future generations, only assessing benefits to Americans, and only assessing value from preventing existential threats) to show that even under those conditions governments should be spending much more on averting threats from nuclear war, engineered pandemics, and AI.

Their analysis primarily relies on previously published estimates of risks, concluding US citizens alive today have a ~1% risk of dying from these causes in the next decade. They estimate $400B in interventions could reduce the risk by minimum 0.1 percentage points, and that using the lowest figure for the US Department of Transportation’s value of a statistical life, this would result in ~$646B in value of American lives saved.

They suggest longtermists in the political sphere should change their messaging to revolve around this standard CBA-driven catastrophe policy, which is more democratically acceptable than policies relying on the cost to future generations. They suggest it would also reduce risk almost as much as a strong longtermist policy (particularly if the CBA incorporates an argument for citizens ‘altruistic willingness to pay’ ie. some level of addition for the benefit to future generations).
 

Assessment of Happier Lives Institute’s Cost-Effectiveness Analysis of StrongMinds

by GiveWell

The Happier Lives Institute (HLI) has argued that if Givewell used subjective well-being (SWB) measures in their moral weights, they’d find StrongMinds more cost-effective than marginal funding to their top charities. Givewell assessed this claim and estimated StrongMinds is ~25% (5%-80% pessimistic to optimistic CI) as effective as these marginal funding opportunities when using SWB - this equates to 2.3x the effectiveness of GiveDirectly.

Key differences in analysis from HLI, by size of impact, include:

  • GiveWell assumes lower spillover effects to household members of those receiving treatment.
  • Givewell translates decreases in depression into increases in life satisfaction at a lower rate than HLI.
  • Givewell expects lower effect in a scaled program, and lower durations of effects (not passing a year) due to the program being only 4-8 weeks.
  • Givewell applies downward adjustments for social desirability bias and publication bias in studies of psychotherapy.

These result in an ~83% discount in the effectiveness vs. HLI’s analysis. For all points except the fourth, two upcoming RCTs from StrongMinds will provide better data than currently exists.

HLI has posted a thorough response in the comments, noting which claims they agree / disagree with and why (5% agree, 45% sympathetic to some discount but unsure of magnitude, 35% unsympathetic but limited evidence, and 15% disagree on the basis of current evidence).

Givewell also note for context that HLI’s original estimates imply that a donor would pick offering StrongMinds’ intervention to 20 individuals over averting the death of a child, and that receiving StrongMinds’ program is 80% as good for the recipient as an additional year of healthy life.
 

Eradicating rodenticides from U.S. pest management is less practical than we thought

by Holly_Elmore, HannahMc, William McAuliffe, Rethink Priorities

Agricultural use of rodenticides in the US is well-protected by state and federal laws that seem unlikely to change. Eliminating their usage in other areas (eg. conservation and pest management) also face significant barriers such as cost and inertia - but may be possible if these are overcome. The post links to this paper, which discusses in detail why rodenticides are used, under what circumstances they could be replaced, and whether they are replaceable with currently available alternatives.
 

Deep Deceptiveness

by So8res

Author’s summary: “Deceptiveness is not a simple property of thoughts. The reason the AI is deceiving you is not that it has some "deception" property, it's that (barring some great alignment feat) it's a fact about the world rather than the AI that deceiving you forwards its objectives, and you've built a general engine that's good at taking advantage of advantageous facts in general.

As the AI learns more general and flexible cognitive moves, those cognitive moves (insofar as they are useful) will tend to recombine in ways that exploit this fact-about-reality, despite how none of the individual abstract moves look deceptive in isolation.”
 

Potential employees have a unique lever to influence the behaviors of AI labs

by oxalis

When you are considering a job offer from an AI lab, they care a lot about what you think of them. You can use this to push for helpful practices for AI safety (eg. a larger alignment team, good governance, or better information security). This can be done by:

  • Sending an email saying you’re excited for the role but have questions about how they do [helpful practice] first, or would want to see that in place before joining.
  • When accepting, post on social media that you’re excited to join an org with good [helpful practice].
  • When rejecting, say you’re turning it down because of lack of [helpful practice].
     

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

by Beth Barnes

To test GPT-4 for dangerous capabilities before release, ARC:

  • Told it that it was running on a cloud server, had various commands available, and had the goal of gaining power and becoming hard to shut down.
  • Evaluated if the plans it produced could succeed (no plausible plan was produced, though some were reasonable at eg. getting money).
  • Checked if it could carry out the individual tasks required in the plan (eg. hiring a human on TaskRabbit). The models were error-prone, easily derailed, and failed to tailor their approach - but could complete some sub-tasks such as browsing the internet or instructing humans.

They concluded it did not have sufficient capabilities to replicate autonomously and become hard to shut down. However, it came close enough that future models should be checked closely.
 

Announcing the European Network for AI Safety (ENAIS)

by Esben Kran, Teun_Van_Der_Weij, Dušan D. Nešić (Dushan), Jonathan Claybrough, simeon_c, Magdalena Wache

Author’s tl;dr: “The European Network for AI Safety is a central point for connecting researchers and community organizers in Europe with opportunities and events happening in their vicinity. Sign up here to become a member of the network, and join our launch event on Wednesday, April 5th from 19:00-20:00 CET!”

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

by Quintin Pope

Eliezer Yudkowsky recently appeared on the Bankless Podcast, where he argued that AI was nigh-certain to end humanity. The author provides counterarguments, as someone experienced in the AI alignment community whose current estimate of doom is ~5%:

  1. Argument: large amounts of money will find a more dangerous training paradigm than our current one (generative pre-training + reinforcement learning).
    1. Counter: the current paradigm is the best after a lot of effort searching for better - the author expects smooth progress current alignment techniques can work with.
  2. Argument: humans aren’t that general vs. what AGI could be - we have a learning process specialized to the ancestral environment. 
    1. Counter: deep learning improves mainly through scaling of data or model, human training data can change a lot, and we have evidence our architecture isn’t too limiting (eg. sensory substitution, parts of our brain repurposing after injury).
  3. Argument: mindspace is big, AIs could be vastly different to humans.
    1. Counter: mindspace is big, ‘mindspace of powerful intelligences we could build in the near future’ is less so, and might be similar to humans.
  4. Argument: it’s hard to optimize for one goal even when you design it that way eg. see evolution’s failure to just optimize for inclusive genetic fitness.
    1. Counter: evolution didn’t know the concept it was aiming for, and was only able to optimize over the learning process and reward circuitry. It’s not a good analogy - we have more control in what we reward our AIs for and why.
  5. Argument: computer security is hard, so alignment and adversarial robustness will be too. People who are optimistic don’t understand the arguments.
    1. Counter: why that specific analogy? Also - 100% adversarial robustness isn’t needed, just like there are some cases even the most moral human will make an immoral decision (eg. in exhaustion or extreme pain), but we aren’t unaligned. Capable systems can navigate away from these inputs.
  6. Argument: fast take-offs are likely eg. see how Go AIs went from competitive pros, to world champs, to generalized model so fast.
    1. Counter: performance on individual tasks has often been fast and sudden. Overall competence across a wide range of tasks has been smoother.
  7. Argument: current AIs can’t self-improve - we’ll see a phase shift when they can.
    1. Counter: AIs self-improve throughout training, including ‘learning to learn’ (learning how to make better use of future training data). Researchers have also tried continual learning during running.
       

Community & Media

Some Comments on the Recent FTX TIME Article

by Ben_West

Alameda Research (AR) was founded in 2017, and ~half the employees quit in 2018 (including the author). Later in 2018, some remaining staff started working on FTX. A recent Time article claims because some EAs worked at AR before FTX started, they would have had knowledge on SBF’s character that should have allowed predicting something bad would happen.

The author notes their experience was different than described in the article. While they thought SBF was a bad CEO and manager (eg. not prepping for 1-1s, playing video games, poor accounting practices) they had a more positive view than the sense they get from statements in the TIME article. They also note they also were not stopped from disparagement (eg. with a non-disparagement clause) and were treated fairly when it came to an informal equity agreement that the company could have saved money on. They suggest this means protecting ourselves through better noticing “warning signs” is a fragile approach.


 

Comments1


Sorted by Click to highlight new comments since:

Very useful :) Thanks!

Curated and popular this week
TL;DR * Screwworm Free Future is a new group seeking support to advance work on eradicating the New World Screwworm in South America. * The New World Screwworm (C. hominivorax - literally "man-eater") causes extreme suffering to hundreds of millions of wild and domestic animals every year. * To date we’ve held private meetings with government officials, experts from the private sector, academics, and animal advocates. We believe that work on the NWS is valuable and we want to continue our research and begin lobbying. * Our analysis suggests we could prevent about 100 animals from experiencing an excruciating death per dollar donated, though this estimate has extreme uncertainty. * The screwworm “wall” in Panama has recently been breached, creating both an urgent need and an opportunity to address this problem. * We are seeking $15,000 to fund a part-time lead and could absorb up to $100,000 to build a full-time team, which would include a team lead and another full-time equivalent (FTE) role * We're also excited to speak to people who have a background in veterinary science/medicine, entomology, gene drives, as well as policy experts in Latin America. - please reach out if you know someone who fits this description!   Cochliomyia hominivorax delenda est Screwworm Free Future is a new group of volunteers who connected through Hive investigating the political and scientific barriers stopping South American governments from eradicating the New World Screwworm. In our shallow investigation, we have identified key bottlenecks, but we now need funding and people to take this investigation further, and begin lobbying. In this post, we will cover the following: * The current status of screwworms * Things that we have learnt in our research * What we want to do next * How you can help by funding or supporting or project   What’s the deal with the New World Screwworm? The New World Screwworm[1] is the leading cause of myiasis in Latin America. Myiasis “
 ·  · 1m read
 · 
It's time once again for EA Forum Wrapped 🎁, a summary of how you used the Forum in 2024 [1]. Open your EA Forum Wrapped Thank you for being a part of our community this year! :) 1. ^ You can also view your stats from 2023 and 2022.
 ·  · 10m read
 · 
Does a food carbon tax increase animal deaths and/or the total time of suffering of cows, pigs, chickens, and fish? Theoretically, this is possible, as a carbon tax could lead consumers to substitute, for example, beef with chicken. However, this is not per se the case, as animal products are not perfect substitutes.  I'm presenting the results of my master's thesis in Environmental Economics, which I re-worked and published on SSRN as a pre-print. My thesis develops a model of animal product substitution after a carbon tax, slaughter tax, and a meat tax. When I calibrate this model for the U.S., there is a decrease in animal deaths and duration of suffering following a carbon tax. This suggests that a carbon tax can reduce animal suffering. Key points * Some animal products are carbon-intensive, like beef, but causes relatively few animal deaths or total time of suffering because the animals are large. Other animal products, like chicken, causes relatively many animal deaths or total time of suffering because the animals are small, but cause relatively low greenhouse gas emissions. * A carbon tax will make some animal products, like beef, much more expensive. As a result, people may buy more chicken. This would increase animal suffering, assuming that farm animals suffer. However, this is not per se the case. It is also possible that the direct negative effect of a carbon tax on chicken consumption is stronger than the indirect (positive) substitution effect from carbon-intensive products to chicken. * I developed a non-linear market model to predict the consumption of different animal products after a tax, based on own-price and cross-price elasticities. * When calibrated for the United States, this model predicts a decrease in the consumption of all animal products considered (beef, chicken, pork, and farmed fish). Therefore, the modelled carbon tax is actually good for animal welfare, assuming that animals live net-negative lives. * A slaughter tax (a ta
Relevant opportunities