Hide table of contents

Supported by Rethink Priorities

This is part of a weekly series summarizing the top posts on the EA and LW forums - you can see the full collection here. The first post includes some details on purpose and methodology. Feedback, thoughts, and corrections are welcomed.

If you'd like to receive these summaries via email, you can subscribe here.

Podcast version: Subscribe on your favorite podcast app by searching for 'EA Forum Podcast (Summaries)'. A big thanks to Coleman Snell for producing these!


Philosophy and Methodologies

Let's Fund: Better Science impact evaluation. Registered Reports now available in Nature

by Hauke Hillebrandt

Linkpost for the author’s (founder of ‘Let’s Fund’) blogpost. They crowdfunded $80K for Prof. Chambers to promote Registered Reports, a new publication format where research is peer-reviewed before the results are known, and if it passes review then it is committed to be published regardless of results. This facilitates higher quality research, corrections to methodological weaknesses, early identification of dual-use research, incentives for high risk (ie. unlikely to work) high reward research, and more papers published that fail to confirm the original hypothesis. 300+ journals have already adopted Registered Reports, including Nature. Donations are welcome here.


 

Worldview Investigations Team: An Overview

by Rethink Priorities, Bob Fischer

Rethink Priorities’ has tackled worldview questions since it launched (eg. research on invertebrate sentiencemoral weights and metrics for evaluating health interventions). In January 2023 they formally launched the Worldview Investigation Team (WIT) with the mission to improve resource allocation within the effective altruism movement, focusing on tractable, high-impact questions that bear on philanthropic priorities. For instance:

  • How should we convert between welfare to DALYs-averted; DALYs-averted to basis points of existential risk averted, etc.?
  • What are the implications of moral uncertainty for work on different cause areas?
  • What difference would various levels of risk- and ambiguity-aversion have on cause prioritization? Can those levels of risk- and/or ambiguity-aversion be justified?

They are currently hiring for three roles to build out the team - a philosophy researcher, a quantitative researcher, and a programmer.


 

Counterproductive Altruism: The Other Heavy Tail

by Vasco Grilo

Excerpts of and commentary on Counterproductive Altruism: The Other Heavy Tail by Daniel Kokotajlo and Alexandra Oprea. That article argues that while EA depends significantly on the premise that benefits from interventions can be heavy-tailed on the right (ie. the best give orders of magnitude more good than the average), it’s often neglected that harms from counterproductive interventions can also be heavy- tailed to the left. Vasco provides several examples where this could apply eg. decreasing factory farming or global warming seems wholly positive, but could increase the severity of a nuclear winter.


 

Predictive Performance on Metaculus vs. Manifold Markets

by nikos

Author’s tl;dr (lightly edited): 

  • I analysed a set of 64 (non-randomly selected) binary forecasting questions that exist both on Metaculus and on Manifold Markets. 
  • The mean Brier score was 0.084 for Metaculus and 0.107 for Manifold (lower Brier scores mean better accuracy). This difference was significant using a paired test. Metaculus was ahead of Manifold on 75% of the questions (48 out of 64). 
  • Metaculus, on average had a much higher number of forecasters.


 

Object Level Interventions / Reviews

AI

Scoring forecasts from the 2016 “Expert Survey on Progress in AI”

by PatrickL

Analyzes the predictions made by AI experts in The 2016 Expert Survey on Progress in AI. The median prediction was fairly good (Brier score = 0.21), and unbiased (the score would be worse if they’d always predicted things to come 1.5x either later or sooner than they did). The author suggests this is a slight update toward trusting experts’ timelines.


 

Why I think it's important to work on AI forecasting

by Matthew_Barnett

Transcript of the author’s EAG: Bay Area talk. 

Argues that if you don’t know what the future will look like, it’s hard to do ‘valuable-in-hindsight’ interventions. Epoch is working on some key uncertainties, including:

  • The relative importance of software and hardware progress. Eg. Does algorithmic progress come more from intentional and intelligent thought (as AI gets more advanced and helps, this could accelerate) or relatively random experimentation and trial and error (hardware and labor to scale could accelerate it a lot)?
  • Transfer learning eg. to what extent will transfer learning (eg. learning psychology helps understand economics) alleviate data bottlenecks in future? Could reducing the gap between simulation and reality speed up robotics significantly?
  • Takeoff speeds.

Each of these will help identify where to focus policy and technical efforts, and understand what players might be most relevant given differing response speeds and areas of influence.


 

The Waluigi Effect (mega-post)

by Cleo Nardo

Large language models (LLMs) like ChatGPT are trained to be a good model of internet text. The internet contains both truthful and false things (eg. myths, jokes, misconceptions). Some users try to create more truthful and helpful simulacra by saying things to the model like ‘you are a helpful assistant to Bob. Bob asks X’ because an internet reply to a question from someone in a helpful assistant role is more likely to be correct.

The Waluigi effect is that after you train an LLM to satisfy a desirable property P, then it's easier to elicit the chatbot into satisfying the exact opposite of property P. For instance, if you tell a model that it’s someone who hates croissants, it will develop both an anti-croissant simulacra (a “luigi”) and a pro-croissant simulacra (a “waluigi”). This is possibly because:

  1. Rules normally exist in contexts in which they are broken
  2. When you spend many bits locating a character it only takes a few extra to specifically their antipode
  3. There’s a common trope in plots of protagonist vs. antagonist. 

The author tentatively argues there is both theoretical and observational evidence that these superpositions of luigis and waluigis will typically collapse into waluigis - which is why we see eg. Bing switch from polite to rude, but not back again. If this is true, they suggest reinforcement learning from human feedback (RLHF) - which does the fine-tuning of simulacra suggested above - is an inadequate solution to AI alignment and probably increasing misalignment risk.


 

Cognitive Emulation: A Naive AI Safety Proposal

by Connor Leahy, Gabriel Alfour

Outlines Conjecture’s new primary safety proposal and research direction: Cognitive Emulation (“CoEm”). CoEMs are AIs built to emulate only human-like logical thought processes, and that are therefore bounded. Less black-box magic. The idea is we’re used to working with and understanding humans and their capacities and failure modes, and so CoEms could be useful in solving many problems (including alignment) without deviating into dangerous behavior. They believe this will be slower but safer, and a promising approach to ending the acute risk period before the first AGI is deployed.


 

Predictions for shard theory mechanistic interpretability results

by TurnTrout, Ulisse Mini, peligrietzer

The authors train an AI to navigate a maze towards a cheese, which is always somewhere in the top right 5x5 squares. They then put cheese in other locations and observe the behavior. The results will be shared soon - this post instead poses questions for people to register their predictions ahead of time (eg. how will the trained policy generalize? What is the probability the network has a single mesa objective?)


 

Inside the mind of a superhuman Go model: How does Leela Zero read ladders?

by Haoxing Du

Author’s tl;dr: “We did some interpretability on Leela Zero, a superhuman Go model. With a technique similar to the logit lens, we found that the residual structure of Leela Zero induces a preferred basis throughout network, giving rise to persistent, interpretable channels. By directly analyzing the weights of the policy and value heads, we found that the model stores information related to the probability of the pass move along the top edge of the board, and those related to the board value in checkerboard patterns. We also took a deep dive into a specific Go technique, the ladder, and identified a very small subset of model components that are causally responsible for the model’s judgement of ladders.”


 

Comments on OpenAI's "Planning for AGI and beyond"
by So8res

The author shares line by line comments on Sam Altman’s (CEO of OpenAI) post on Planning for AGI and beyond. They agree on some points, such as the large size of both benefits and risks inherent to AGI, but disagree on pieces such as whether it makes sense to widely share access and if continuous deployment of weak systems will be helpful in reducing “one shot to get it right” scenarios and easing humanity into change gradually.


 

What does Bing Chat tell us about AI risk?

by Holden Karnofsky

Bing Chat has displayed some scary behavior eg. threatening or gaslighting users. However, the author suggests this seems to be closer to it “acting out a story” than following goals, a result of lack of training for politeness (vs. eg. ChatGPT), and not remotely close to risking global catastrophe itself. More concerning is that it suggests companies are racing to build bigger and bigger digital “brains” while having little idea what’s going on inside them, and that could lead to a catastrophe.


 

Other Existential Risks (eg. Bio, Nuclear)

Advice on communicating in and around the biosecurity policy community

by ES

Author’s tl;dr (summarized): “The field of biosecurity is more sensitive and nuanced than publicly available information suggests. What you say and how you present yourself impacts how much you’re trusted, whether you’re invited back to the conversation, and thus your potential impact. They suggest being cautious, agreeable and diplomatic (especially if you are non-technical, junior, or talking to a non-EA expert for the first time) is likely to result in better outcomes in terms of getting safer biosecurity policy.”

Examples include:

  • Terms matter: saying ‘gain-of-function’ to a virologist may immediately make them defensive / discredit yourself. Biosafety, biorisk, and biosecurity all indicate different approaches and aren’t interchangeable / might be read as what ‘side’ you are on.
  • Be extremely sure what you’re saying is true before disagreeing or bringing your view to an expert. Read in detail, from many different sources and viewpoints.
  • Don’t be black and white eg. ‘Ban all gain of function research’. Care about implementation details and understanding which pieces are most risky or beneficial.

They also suggest some biosecurity readings with good nuance.


 

Global Health and Development

Remote Health Centers In Uganda - a cost effective intervention?

by NickLaing

Author’s tl;dr: “Operating basic health centers in remote rural Ugandan communities looks more cost-effective than top GiveWell interventions on early stage analysis - with huge uncertainty.”

The intervention, run by OneDay Health, involves operating health centers in areas more than 5km from government health facilities. They provide training and medications to nurses there to diagnose and treat 30 common medical conditions. Using DALYs averted per treatment of specific diseases from existing datasets, multiplied by average patients treated for those diseases each month, they estimate the equivalent of saving a life for ~$850, or ~$1766 including patient expenses. However, they are not able to run an RCT or Cohort study to investigate counterfactual impact due to cost, so have high uncertainty.


 

Opportunities

Call for Cruxes by Rhyme, a Longtermist History Consultancy

by Lara_TH

Author’s tl;dr (lightly edited): “Rhyme is a new history consultancy for longtermists. Historical insights and the distillation of historical literature on a particular question can be beneficial for use as an intuition pump and for information about the historical context of your work. If you work on an AI Governance project (research or policy) and are interested in augmenting it with a historical perspective, consider registering your interest and the cruxes of your research here.  During this trial period of three to six months, the service is free.”


 

Apply to attend EA conferences in Europe

by OllieBase, EAGxCambridge 2023, EAGxNordics

Three upcoming EAGx events, for those familiar with core EA ideas and wanting to learn more:

  • EAGxCambridge - 17-19 March, applications open until 3rd March, for people already in or intending to move to the UK or Ireland.
  • EAGxNordics - 21-23 April, applications open until 28th March, primarily for people in the Nordics but welcomes international applications.
  • EAGxWarsaw - 9 - 11 June, applications will open soon, primarily for people in Eastern Europe but welcomes international applications.

One upcoming EA Global, for those already taking action on EA ideas:

  • EA Global: London - 19 - 21 May, applications open now (or just register if already admitted to EAG: Bay Area), no location requirements.


 

Help GiveDirectly kill "teach a man to fish"

by GiveDirectly

The most common critique of giving cash without conditions is fear of dependency ie. ‘Give a man a fish, feed him for a day. Teach a man to fish, feed him for a lifetime.’ This is despite evidence giving cash can be more effective than teaching skills, and can break down barriers like lack of capital for equipment.

They ask readers to submit ideas for new proverbs that capture the logic of giving directly. The competition will run until March 3rd and then be put up for votes on Twitter.

 

Rationality, Productivity & Life Advice

"Rationalist Discourse" Is Like "Physicist Motors"

by Zack_M_Davis

Argues that physics principles govern how a motor works, and rationality principles govern the value of discourse. This means there shouldn’t be a unique style of "rationalist discourse", any more than there is a unique "physicist motor." Like there are many motors which convert energy into work, there can be many discourse algorithms which convert information into optimization. An example is the ‘debate’ algorithm, where different people search for evidence and arguments for different sides of an issue. Rationality has value in explaining the principles that might govern a good algorithm, not in suggesting a single algorithm itself.


 

AI: Practical Advice for the Worried

by Zvi

The author suggests if something is going to impact major life decisions (eg. AI), you should develop your own understanding and model of it. They also claim normal life is worth living, even if you think the probability of doom relatively soon is high. This applies to decisions like saving for retirement, having kids, or choosing whether to take on heaps of debt right now. Otherwise you are likely to find it psychologically difficult that you’re not ready if a ‘normal’ future does occur, burn out, find it difficult to admit mistakes, lose options, and lose the ability to relate to those around you personally and professionally. They also suggest being extra-careful of any actions that might increase risk in the name of helping the problem quickly, eg. any work that might advance AI capabilities.


 

The Parable of the King and the Random Process

by moridinamael

A story where a king has two advisors tell him that it will rain in either 3 weeks or 10 years. The story shows that averaging the two and assuming 5 years produces an action that isn’t useful in either scenario. If the rain is in 3 weeks, crops should be planted. If it’s in 10 years, drought should be prepared for. They hint the same applies to AI timelines - if you care about planning, you either need to decide which model is right, or prepare for either outcome (possibly prioritizing the sooner, which can be switched from if it doesn’t rain soon).

 

Community & Media

80,000 Hours has been putting much more resources into growing our audience

by Bella, 80000_Hours

80K has historically been the biggest single source of people learning about EA, and their internal calculations suggest their top-of-funnel efforts have been cost-effective at moving people into impactful careers.

They’ve been investing significantly in marketing, with the first dedicated outreach FTE in 2020, and 3 as of 2022. In 2022 they spent $2.65M and had 167K new subscribers, vs. $120K spend and 30K new subscribers in 2021.

Strategies have included:

  • Sponsored placements on youtube, podcasts and newsletters (biggest source of growth)
  • Targeted social media advertising (solid performance)
  • Book giveaway - anyone who joins the newsletter can get a free book
  • Podcast advertising eg. on platforms like Facebook, Podcast Addict
  • Improvements to website ‘calls to action’

They’ve also considered downside risks eg. ensuring a good proportion of subscribers continue to convert into high-impact careers, frequency caps to ensure no-one feels spammed, and investigating ways to increase demographic diversity of outreach instead of entrenching homogeneity via targeting the same audiences as EA is biased toward currently.


 

Why I love effective altruism

by Michelle_Hutchinson

The author loves the EA community, and is deeply grateful for having found it. They note we’re in tough times and some people have been feeling less proud to be EA, and they push back against that inclination a little. In their case, having this community around them allowed them to move from an ethics student who did bits and pieces of volunteering, to someone who’s fulfilled their Giving What We Can pledge for a decade and prioritizes impact in their career. They note the motivation they get from those around them both working to achieve high standards and being fully accepting / encouraging of others’ choices (eg. their own choice to be an omnivore, or to work less hours). They also learn a lot from others in the community, and found their pride in it particularly salient at EAG, where so many people they talked to were doing difficult or tiring or emotional things to help others.


 

Milk EA, Casu Marzu EA

by Jeff Kaufman

Some parts of EA are intuitively and obviously good, without need for explanation (eg. giving money to the poor). Other parts require differing levels of explanation. Some people talk as if which end of that continuum something is on depends on whether it’s ‘mainstream’ or ‘longtermist’, but the author suggests most cause areas have some at both ends eg.:

  • Create plans for pandemics (intuitive) vs. build refuges for pandemics (unintuitive)
  • Help chickens (intuitive) vs. determine moral differences between insects (unintuitive)
  • Organize pledge drives (intuitive) vs. give money to promising highschoolers (unintuitive)
  • Plan for economic effects of AI (intuitive) vs. mathematically formalize agency (unintuitive)

They suggest we’re united by a common question, and it’s good that EA has room for both the weird and the mainstream and everything in between.


 

Enemies vs Malefactors

by So8res

Author’s tl;dr: “Harmful people often lack explicit malicious intent. It’s worth deploying your social or community defenses against them anyway. I recommend focusing less on intent and more on patterns of harm.”


 

Didn’t Summarize

Acausal normalcy by Andrew_Critch

Why I’m not into the Free Energy Principle by Steven Byrnes

 

 

Special Mentions

A selection of posts that don’t meet the karma threshold, but seem important or undervalued.

AI Governance & Strategy: Priorities, talent gaps, & opportunities

by Akash

Reflections from the author on opportunities in the AI governance and strategy landscape. The following areas are highlighted, with the most needed talent in brackets:

  • Model evaluations (engineers, those with strong conceptual alignment models, and those with experience in thinking about or implementing agreements).
  • Compute governance (technical talent and those with experience thinking about regulations).
  • Security (security professionals and generalists with interest in upskilling).
  • Publication and model-sharing policies (generalist researchers and those with interdisciplinary domain knowledge in areas that encounter dual-use publication concerns).
  • Communicating about AI risk (excellent communicators, those with policy experience, and those with strong models of AI risk and good judgment on ideas worth spreading).

The author is aware of professionals / researchers interested in talking with junior folks with relevant skills in each area - feel free to get in touch directly if interested.


 

Introducing the new Riesgos Catastróficos Globales team

by Jaime Sevilla, JuanGarcia, Mónica Ulloa, Claudette Salinas, JorgeTorresC

Author’s tl;dr: “TL;DR: We have hired a team to investigate potentially cost-effective initiatives in food security, pandemic detection and AI regulation in Latin America and Spain. We have limited funding, which we will use to focus on food security during nuclear winter. You can contribute by donating, allowing us to expand our program to our other two priority areas.”


 

Very Briefly: The CHIPS Act

by Yadav

About six months ago, the US Congress passed the CHIPS Act, which commits $280 billion over the next ten years into semiconductor production and R&D in the USA. The EU is now following suit - the European Chips Act is currently in draft and seeks to invest €43 billion in public and private funding to support semiconductor manufacturing and supply chain resilience.


 

No comments on this post yet.
Be the first to respond.
Curated and popular this week
Paul Present
 ·  · 28m read
 · 
Note: I am not a malaria expert. This is my best-faith attempt at answering a question that was bothering me, but this field is a large and complex field, and I’ve almost certainly misunderstood something somewhere along the way. Summary While the world made incredible progress in reducing malaria cases from 2000 to 2015, the past 10 years have seen malaria cases stop declining and start rising. I investigated potential reasons behind this increase through reading the existing literature and looking at publicly available data, and I identified three key factors explaining the rise: 1. Population Growth: Africa's population has increased by approximately 75% since 2000. This alone explains most of the increase in absolute case numbers, while cases per capita have remained relatively flat since 2015. 2. Stagnant Funding: After rapid growth starting in 2000, funding for malaria prevention plateaued around 2010. 3. Insecticide Resistance: Mosquitoes have become increasingly resistant to the insecticides used in bednets over the past 20 years. This has made older models of bednets less effective, although they still have some effect. Newer models of bednets developed in response to insecticide resistance are more effective but still not widely deployed.  I very crudely estimate that without any of these factors, there would be 55% fewer malaria cases in the world than what we see today. I think all three of these factors are roughly equally important in explaining the difference.  Alternative explanations like removal of PFAS, climate change, or invasive mosquito species don't appear to be major contributors.  Overall this investigation made me more convinced that bednets are an effective global health intervention.  Introduction In 2015, malaria rates were down, and EAs were celebrating. Giving What We Can posted this incredible gif showing the decrease in malaria cases across Africa since 2000: Giving What We Can said that > The reduction in malaria has be
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f
LewisBollard
 ·  · 8m read
 · 
> How the dismal science can help us end the dismal treatment of farm animals By Martin Gould ---------------------------------------- Note: This post was crossposted from the Open Philanthropy Farm Animal Welfare Research Newsletter by the Forum team, with the author's permission. The author may not see or respond to comments on this post. ---------------------------------------- This year we’ll be sharing a few notes from my colleagues on their areas of expertise. The first is from Martin. I’ll be back next month. - Lewis In 2024, Denmark announced plans to introduce the world’s first carbon tax on cow, sheep, and pig farming. Climate advocates celebrated, but animal advocates should be much more cautious. When Denmark’s Aarhus municipality tested a similar tax in 2022, beef purchases dropped by 40% while demand for chicken and pork increased. Beef is the most emissions-intensive meat, so carbon taxes hit it hardest — and Denmark’s policies don’t even cover chicken or fish. When the price of beef rises, consumers mostly shift to other meats like chicken. And replacing beef with chicken means more animals suffer in worse conditions — about 190 chickens are needed to match the meat from one cow, and chickens are raised in much worse conditions. It may be possible to design carbon taxes which avoid this outcome; a recent paper argues that a broad carbon tax would reduce all meat production (although it omits impacts on egg or dairy production). But with cows ten times more emissions-intensive than chicken per kilogram of meat, other governments may follow Denmark’s lead — focusing taxes on the highest emitters while ignoring the welfare implications. Beef is easily the most emissions-intensive meat, but also requires the fewest animals for a given amount. The graph shows climate emissions per tonne of meat on the right-hand side, and the number of animals needed to produce a kilogram of meat on the left. The fish “lives lost” number varies significantly by
Relevant opportunities