Supported by Rethink Priorities
This is part of a weekly series summarizing the top posts on the EA and LW forums - you can see the full collection here. The first post includes some details on purpose and methodology. Feedback, thoughts, and corrections are welcomed.
If you'd like to receive these summaries via email, you can subscribe here.
Podcast version: Subscribe on your favorite podcast app by searching for 'EA Forum Podcast (Summaries)'. A big thanks to Coleman Snell for producing these!
Philosophy and Methodologies
by Hauke Hillebrandt
Linkpost for the author’s (founder of ‘Let’s Fund’) blogpost. They crowdfunded $80K for Prof. Chambers to promote Registered Reports, a new publication format where research is peer-reviewed before the results are known, and if it passes review then it is committed to be published regardless of results. This facilitates higher quality research, corrections to methodological weaknesses, early identification of dual-use research, incentives for high risk (ie. unlikely to work) high reward research, and more papers published that fail to confirm the original hypothesis. 300+ journals have already adopted Registered Reports, including Nature. Donations are welcome here.
by Rethink Priorities, Bob Fischer
Rethink Priorities’ has tackled worldview questions since it launched (eg. research on invertebrate sentience, moral weights and metrics for evaluating health interventions). In January 2023 they formally launched the Worldview Investigation Team (WIT) with the mission to improve resource allocation within the effective altruism movement, focusing on tractable, high-impact questions that bear on philanthropic priorities. For instance:
- How should we convert between welfare to DALYs-averted; DALYs-averted to basis points of existential risk averted, etc.?
- What are the implications of moral uncertainty for work on different cause areas?
- What difference would various levels of risk- and ambiguity-aversion have on cause prioritization? Can those levels of risk- and/or ambiguity-aversion be justified?
by Vasco Grilo
Excerpts of and commentary on Counterproductive Altruism: The Other Heavy Tail by Daniel Kokotajlo and Alexandra Oprea. That article argues that while EA depends significantly on the premise that benefits from interventions can be heavy-tailed on the right (ie. the best give orders of magnitude more good than the average), it’s often neglected that harms from counterproductive interventions can also be heavy- tailed to the left. Vasco provides several examples where this could apply eg. decreasing factory farming or global warming seems wholly positive, but could increase the severity of a nuclear winter.
Author’s tl;dr (lightly edited):
- I analysed a set of 64 (non-randomly selected) binary forecasting questions that exist both on Metaculus and on Manifold Markets.
- The mean Brier score was 0.084 for Metaculus and 0.107 for Manifold (lower Brier scores mean better accuracy). This difference was significant using a paired test. Metaculus was ahead of Manifold on 75% of the questions (48 out of 64).
- Metaculus, on average had a much higher number of forecasters.
Object Level Interventions / Reviews
Analyzes the predictions made by AI experts in The 2016 Expert Survey on Progress in AI. The median prediction was fairly good (Brier score = 0.21), and unbiased (the score would be worse if they’d always predicted things to come 1.5x either later or sooner than they did). The author suggests this is a slight update toward trusting experts’ timelines.
Transcript of the author’s EAG: Bay Area talk.
Argues that if you don’t know what the future will look like, it’s hard to do ‘valuable-in-hindsight’ interventions. Epoch is working on some key uncertainties, including:
- The relative importance of software and hardware progress. Eg. Does algorithmic progress come more from intentional and intelligent thought (as AI gets more advanced and helps, this could accelerate) or relatively random experimentation and trial and error (hardware and labor to scale could accelerate it a lot)?
- Transfer learning eg. to what extent will transfer learning (eg. learning psychology helps understand economics) alleviate data bottlenecks in future? Could reducing the gap between simulation and reality speed up robotics significantly?
- Takeoff speeds.
Each of these will help identify where to focus policy and technical efforts, and understand what players might be most relevant given differing response speeds and areas of influence.
by Cleo Nardo
Large language models (LLMs) like ChatGPT are trained to be a good model of internet text. The internet contains both truthful and false things (eg. myths, jokes, misconceptions). Some users try to create more truthful and helpful simulacra by saying things to the model like ‘you are a helpful assistant to Bob. Bob asks X’ because an internet reply to a question from someone in a helpful assistant role is more likely to be correct.
The Waluigi effect is that after you train an LLM to satisfy a desirable property P, then it's easier to elicit the chatbot into satisfying the exact opposite of property P. For instance, if you tell a model that it’s someone who hates croissants, it will develop both an anti-croissant simulacra (a “luigi”) and a pro-croissant simulacra (a “waluigi”). This is possibly because:
- Rules normally exist in contexts in which they are broken
- When you spend many bits locating a character it only takes a few extra to specifically their antipode
- There’s a common trope in plots of protagonist vs. antagonist.
The author tentatively argues there is both theoretical and observational evidence that these superpositions of luigis and waluigis will typically collapse into waluigis - which is why we see eg. Bing switch from polite to rude, but not back again. If this is true, they suggest reinforcement learning from human feedback (RLHF) - which does the fine-tuning of simulacra suggested above - is an inadequate solution to AI alignment and probably increasing misalignment risk.
by Connor Leahy, Gabriel Alfour
Outlines Conjecture’s new primary safety proposal and research direction: Cognitive Emulation (“CoEm”). CoEMs are AIs built to emulate only human-like logical thought processes, and that are therefore bounded. Less black-box magic. The idea is we’re used to working with and understanding humans and their capacities and failure modes, and so CoEms could be useful in solving many problems (including alignment) without deviating into dangerous behavior. They believe this will be slower but safer, and a promising approach to ending the acute risk period before the first AGI is deployed.
by TurnTrout, Ulisse Mini, peligrietzer
The authors train an AI to navigate a maze towards a cheese, which is always somewhere in the top right 5x5 squares. They then put cheese in other locations and observe the behavior. The results will be shared soon - this post instead poses questions for people to register their predictions ahead of time (eg. how will the trained policy generalize? What is the probability the network has a single mesa objective?)
by Haoxing Du
Author’s tl;dr: “We did some interpretability on Leela Zero, a superhuman Go model. With a technique similar to the logit lens, we found that the residual structure of Leela Zero induces a preferred basis throughout network, giving rise to persistent, interpretable channels. By directly analyzing the weights of the policy and value heads, we found that the model stores information related to the probability of the pass move along the top edge of the board, and those related to the board value in checkerboard patterns. We also took a deep dive into a specific Go technique, the ladder, and identified a very small subset of model components that are causally responsible for the model’s judgement of ladders.”
The author shares line by line comments on Sam Altman’s (CEO of OpenAI) post on Planning for AGI and beyond. They agree on some points, such as the large size of both benefits and risks inherent to AGI, but disagree on pieces such as whether it makes sense to widely share access and if continuous deployment of weak systems will be helpful in reducing “one shot to get it right” scenarios and easing humanity into change gradually.
by Holden Karnofsky
Bing Chat has displayed some scary behavior eg. threatening or gaslighting users. However, the author suggests this seems to be closer to it “acting out a story” than following goals, a result of lack of training for politeness (vs. eg. ChatGPT), and not remotely close to risking global catastrophe itself. More concerning is that it suggests companies are racing to build bigger and bigger digital “brains” while having little idea what’s going on inside them, and that could lead to a catastrophe.
Other Existential Risks (eg. Bio, Nuclear)
Author’s tl;dr (summarized): “The field of biosecurity is more sensitive and nuanced than publicly available information suggests. What you say and how you present yourself impacts how much you’re trusted, whether you’re invited back to the conversation, and thus your potential impact. They suggest being cautious, agreeable and diplomatic (especially if you are non-technical, junior, or talking to a non-EA expert for the first time) is likely to result in better outcomes in terms of getting safer biosecurity policy.”
- Terms matter: saying ‘gain-of-function’ to a virologist may immediately make them defensive / discredit yourself. Biosafety, biorisk, and biosecurity all indicate different approaches and aren’t interchangeable / might be read as what ‘side’ you are on.
- Be extremely sure what you’re saying is true before disagreeing or bringing your view to an expert. Read in detail, from many different sources and viewpoints.
- Don’t be black and white eg. ‘Ban all gain of function research’. Care about implementation details and understanding which pieces are most risky or beneficial.
They also suggest some biosecurity readings with good nuance.
Global Health and Development
Author’s tl;dr: “Operating basic health centers in remote rural Ugandan communities looks more cost-effective than top GiveWell interventions on early stage analysis - with huge uncertainty.”
The intervention, run by OneDay Health, involves operating health centers in areas more than 5km from government health facilities. They provide training and medications to nurses there to diagnose and treat 30 common medical conditions. Using DALYs averted per treatment of specific diseases from existing datasets, multiplied by average patients treated for those diseases each month, they estimate the equivalent of saving a life for ~$850, or ~$1766 including patient expenses. However, they are not able to run an RCT or Cohort study to investigate counterfactual impact due to cost, so have high uncertainty.
Author’s tl;dr (lightly edited): “Rhyme is a new history consultancy for longtermists. Historical insights and the distillation of historical literature on a particular question can be beneficial for use as an intuition pump and for information about the historical context of your work. If you work on an AI Governance project (research or policy) and are interested in augmenting it with a historical perspective, consider registering your interest and the cruxes of your research here. During this trial period of three to six months, the service is free.”
by OllieBase, EAGxCambridge 2023, EAGxNordics
Three upcoming EAGx events, for those familiar with core EA ideas and wanting to learn more:
- EAGxCambridge - 17-19 March, applications open until 3rd March, for people already in or intending to move to the UK or Ireland.
- EAGxNordics - 21-23 April, applications open until 28th March, primarily for people in the Nordics but welcomes international applications.
- EAGxWarsaw - 9 - 11 June, applications will open soon, primarily for people in Eastern Europe but welcomes international applications.
One upcoming EA Global, for those already taking action on EA ideas:
- EA Global: London - 19 - 21 May, applications open now (or just register if already admitted to EAG: Bay Area), no location requirements.
The most common critique of giving cash without conditions is fear of dependency ie. ‘Give a man a fish, feed him for a day. Teach a man to fish, feed him for a lifetime.’ This is despite evidence giving cash can be more effective than teaching skills, and can break down barriers like lack of capital for equipment.
Rationality, Productivity & Life Advice
Argues that physics principles govern how a motor works, and rationality principles govern the value of discourse. This means there shouldn’t be a unique style of "rationalist discourse", any more than there is a unique "physicist motor." Like there are many motors which convert energy into work, there can be many discourse algorithms which convert information into optimization. An example is the ‘debate’ algorithm, where different people search for evidence and arguments for different sides of an issue. Rationality has value in explaining the principles that might govern a good algorithm, not in suggesting a single algorithm itself.
The author suggests if something is going to impact major life decisions (eg. AI), you should develop your own understanding and model of it. They also claim normal life is worth living, even if you think the probability of doom relatively soon is high. This applies to decisions like saving for retirement, having kids, or choosing whether to take on heaps of debt right now. Otherwise you are likely to find it psychologically difficult that you’re not ready if a ‘normal’ future does occur, burn out, find it difficult to admit mistakes, lose options, and lose the ability to relate to those around you personally and professionally. They also suggest being extra-careful of any actions that might increase risk in the name of helping the problem quickly, eg. any work that might advance AI capabilities.
A story where a king has two advisors tell him that it will rain in either 3 weeks or 10 years. The story shows that averaging the two and assuming 5 years produces an action that isn’t useful in either scenario. If the rain is in 3 weeks, crops should be planted. If it’s in 10 years, drought should be prepared for. They hint the same applies to AI timelines - if you care about planning, you either need to decide which model is right, or prepare for either outcome (possibly prioritizing the sooner, which can be switched from if it doesn’t rain soon).
Community & Media
by Bella, 80000_Hours
80K has historically been the biggest single source of people learning about EA, and their internal calculations suggest their top-of-funnel efforts have been cost-effective at moving people into impactful careers.
They’ve been investing significantly in marketing, with the first dedicated outreach FTE in 2020, and 3 as of 2022. In 2022 they spent $2.65M and had 167K new subscribers, vs. $120K spend and 30K new subscribers in 2021.
Strategies have included:
- Sponsored placements on youtube, podcasts and newsletters (biggest source of growth)
- Targeted social media advertising (solid performance)
- Book giveaway - anyone who joins the newsletter can get a free book
- Podcast advertising eg. on platforms like Facebook, Podcast Addict
- Improvements to website ‘calls to action’
They’ve also considered downside risks eg. ensuring a good proportion of subscribers continue to convert into high-impact careers, frequency caps to ensure no-one feels spammed, and investigating ways to increase demographic diversity of outreach instead of entrenching homogeneity via targeting the same audiences as EA is biased toward currently.
The author loves the EA community, and is deeply grateful for having found it. They note we’re in tough times and some people have been feeling less proud to be EA, and they push back against that inclination a little. In their case, having this community around them allowed them to move from an ethics student who did bits and pieces of volunteering, to someone who’s fulfilled their Giving What We Can pledge for a decade and prioritizes impact in their career. They note the motivation they get from those around them both working to achieve high standards and being fully accepting / encouraging of others’ choices (eg. their own choice to be an omnivore, or to work less hours). They also learn a lot from others in the community, and found their pride in it particularly salient at EAG, where so many people they talked to were doing difficult or tiring or emotional things to help others.
by Jeff Kaufman
Some parts of EA are intuitively and obviously good, without need for explanation (eg. giving money to the poor). Other parts require differing levels of explanation. Some people talk as if which end of that continuum something is on depends on whether it’s ‘mainstream’ or ‘longtermist’, but the author suggests most cause areas have some at both ends eg.:
- Create plans for pandemics (intuitive) vs. build refuges for pandemics (unintuitive)
- Help chickens (intuitive) vs. determine moral differences between insects (unintuitive)
- Organize pledge drives (intuitive) vs. give money to promising highschoolers (unintuitive)
- Plan for economic effects of AI (intuitive) vs. mathematically formalize agency (unintuitive)
They suggest we’re united by a common question, and it’s good that EA has room for both the weird and the mainstream and everything in between.
Author’s tl;dr: “Harmful people often lack explicit malicious intent. It’s worth deploying your social or community defenses against them anyway. I recommend focusing less on intent and more on patterns of harm.”
Acausal normalcy by Andrew_Critch
Why I’m not into the Free Energy Principle by Steven Byrnes
A selection of posts that don’t meet the karma threshold, but seem important or undervalued.
Reflections from the author on opportunities in the AI governance and strategy landscape. The following areas are highlighted, with the most needed talent in brackets:
- Model evaluations (engineers, those with strong conceptual alignment models, and those with experience in thinking about or implementing agreements).
- Compute governance (technical talent and those with experience thinking about regulations).
- Security (security professionals and generalists with interest in upskilling).
- Publication and model-sharing policies (generalist researchers and those with interdisciplinary domain knowledge in areas that encounter dual-use publication concerns).
- Communicating about AI risk (excellent communicators, those with policy experience, and those with strong models of AI risk and good judgment on ideas worth spreading).
The author is aware of professionals / researchers interested in talking with junior folks with relevant skills in each area - feel free to get in touch directly if interested.
by Jaime Sevilla, JuanGarcia, Mónica Ulloa, Claudette Salinas, JorgeTorresC
Author’s tl;dr: “TL;DR: We have hired a team to investigate potentially cost-effective initiatives in food security, pandemic detection and AI regulation in Latin America and Spain. We have limited funding, which we will use to focus on food security during nuclear winter. You can contribute by donating, allowing us to expand our program to our other two priority areas.”
About six months ago, the US Congress passed the CHIPS Act, which commits $280 billion over the next ten years into semiconductor production and R&D in the USA. The EU is now following suit - the European Chips Act is currently in draft and seeks to invest €43 billion in public and private funding to support semiconductor manufacturing and supply chain resilience.