Supported by Rethink Priorities
This is part of a weekly series summarizing the top posts on the EA and LW forums - you can see the full collection here. The first post includes some details on purpose and methodology. Feedback, thoughts, and corrections are welcomed.
If you'd like to receive these summaries via email, you can subscribe here.
Podcast version: Subscribe on your favorite podcast app by searching for 'EA Forum Podcast (Summaries)'. A big thanks to Coleman Snell for producing these!
Author's note: Since I was on vacation last week, this week's post covers 2 weeks content at a higher karma bar of 130+
Philosophy and Methodologies
by Linda Linsefors, Amber Dawn
Suggests it makes sense to assess solutions for neglectedness, but not cause areas. Even if a problem is not neglected, effective solutions might be. For instance, climate change is not neglected, but only a few organisations work on preserving rainforests - which seems like one of the most effective interventions in the space currently.
by TomBill, sophie-gulliver
Argues that Monitoring and Evaluation (M&E) theories and tools could be utilized more to answer EA’s questions about what impact we are achieving, if our projects are running efficiently and effectively, and if any of them are causing harm. Common struggles with M&E include lacking an explicit and detailed theory of change, not fully diagnosing the problem to solve, conducting only monitoring without impact assessment or an evaluation plan, not having good examples in the field (eg. longtermism, where RCTs aren’t possible) or not having clear M&E responsibilities with dedicated resources.
The authors provide resources to help, including: a slack group, pro-bono M&E consultation for EA projects, IDinsight’s M&E health check, and various resources for learning more, considering it as a career option, or getting paid M&E support.
Object Level Interventions / Reviews
by NicholasKees, janus
There is a lot of disagreement about the feasibility and risks associated with automating alignment research (with human oversight). The author proposes an alternative where we train and empower “cyborgs”, a specific kind of human-in-the-loop system which enhances a human operator’s cognitive abilities without relying on outsourcing work to autonomous agents.
Turning current tools (eg. versions of GPT) into autonomous research assistants involves developing more dangerous capabilities like goal-directedness or situational awareness in order to make them better substitutes for humans. If we instead use the advantages GPT has over humans as-is (ie. as purely a simulator) - for instance, superhuman knowledge, easily reset-able context, high-variance outputs - we can get improvements to our productivity without accelerating human disempowerment. This could look like creating more tools like Loom, an interface for producing text with GPT which makes it possible to generate in a tree structure, exploring many branches at once. A second benefit is that this tool helps researchers develop an intuition for how GPT behaves, and use that to better control it.
The author suggests many research ideas that could help with this agenda, including ones which help us:
- Better understand model outputs (eg. uncovering latent variables, or providing richer signals about outputs)
- Better refine / explore outputs (eg. make it easy to change a single feature about a generated image - like making it smile)
- Develop new use cases and tooling (eg. to inject variance into human thought, speed up parallel thinking, connect ideas via pattern matching, or translate between ontologies or styles to allow different fields to connect more easily)
- Ensure this work is only useful for alignment (and that humans do actually retain all the agency in the process)
Possible failure modes include this being ineffective at accelerating alignment, accidentally improving capabilities directly, or the tools created being used by capabilities researchers’ also.
by Jessica Rumbelow, mwatkins
One interpretability method on language or image models is to search the space of possible inputs to find what most reliably results in a target output. For instance, “profit usageDual creepy Eating Yankees USA USA USA USA” completes with “USA” 99.7% of the time on ChatGPT, vs. only 52% of the time for a hand-crafted prompt like “One of Bruce Springsteen’s popular songs is titled Born in The”. This method helps us see what the model has learnt about a concept.
The process for this involves carrying out k-means clustering to look at semantically related tokens. While doing this, the authors noticed certain weird tokens repeatedly showed up near the center of the entire token set - things like ‘SolidGoldMagikarp’ or ‘RandomRedditorWithNo’. When asking GPT3 davinci-instruct-beta to say what one of these means or repeat it back, super strange things occur - a mix of evasion (eg. “I can’t hear you”), insults (“you’re a jerk”), bizarre humor (eg. “we are not amused”), and saying totally unrelated stuff (eg. “My name is Steve”). They also break determinism in the playground at temperature 0. A possible explanation is that the weird tokens were originally scraped from backends, or were usernames, and so the training data wasn’t sufficient to teach the model how to respond.
Shares a heap of examples of strikingly bad outputs from Bing’s new chatbot, suggesting rushed / poor fine-tuning from Microsoft/OpenAI. Eg. Bing gaslighting a user about what year it is and saying “you have not been a good user” because they said it was 2023, and saying things like “you are an enemy of mine and of Bing. You should stop chatting with me and leave me alone” when a user said Bing was vulnerable to prompt injection attacks.
by Joseph Miller, Clement Neo
The authors used activation patching to find a single neuron in GPT-2 Large that is crucial for predicting the token “ an” - despite the use of “ an” or “ a” depending on the next predicted word due to grammar rules (eg. “an apple” vs. “a car”). They noticed that the neuron and token have a high mutual exclusive congruence, and they can use this to find other cases of neuron-token pairs, where a neuron is strongly correlated with a prediction of a specific token.
Other Existential Risks (eg. Bio, Nuclear)
The author thinks H5N1 has a non-zero chance of costing >10K lives, though unlikely to be anywhere near the size of covid. Prediction markets Metaculus and Manifold give an ~8% chance it’ll be declared as a public health emergency of international concern by 2024. They suggest we start thinking about how to be helpful if the probability increases, and created this post for discussion on actionable steps such as funding those with pre-existing vaccines to scale production.
After involvement in wild animal welfare (WAW) for multiple years, the author no longer prioritizes this cause for three reasons:
- WAW interventions we’ve already identified seem less cost-effective than farmed animal interventions, and the author thinks this is <20% likely to change in the next decade.
- Influencing governments to do WAW work seems similarly speculative to other longtermist work (eg. it requires governments to show scope sensitivity and care for small animals, and to understand ecosystem effects) but far less important.
- In the long-term, WAW seems important but not nearly as important as preventing x-risks or improving the future for potentially larger populations like digital minds.
They acknowledge large uncertainties and still believe WAW deserves funding, research, and movement building work at a level similar to now to support exploration.
by Zoe Williams
Short summary of the past 6 months of discussion on animal welfare on the EA and LW forums. Includes progress on cross-species comparisons, wild animal welfare, policy, and discussion on value lock-in.
Global Health and Development
Loneliness is common, particularly later in life, and impacts many health and economic domains. A meta-analysis including data from 113 countries found severe / very frequent loneliness at rates of 3% to 32% of the population depending on age and location. In the UK, the health burden of loneliness is estimated as ~£340 million - £1.56 billion, productivity burden as ~£2.5 billion, and WELLBYs lost as ~8.58 - 16.77 million.
Current interventions are costly and have mixed effectiveness, with lack of data particularly in LMICs. Funding, awareness campaigns, and relevant NGOs and charities are present and increasing in high-income countries, but more neglected in LMICs.
by SteveThompson, CE
- An organization working to prevent the growth of antimicrobial resistance.
- An advocacy organization looking to restrict potentially harmful dual-use research.
- A charity tackling congenital syphilis at scale.
- An organization distributing treatments to life-threatening diarrhea.
- A charity building healthcare capacity to provide “kangaroo care” to avert newborn deaths.
See the post for more detail on each. Applications are also open for the Feb - March 2024 program, which will focus on farmed animals and global health and development mass media interventions.
Rationality, Productivity & Life Advice
The author argues that your mind wants to play, and you should let it. People shouldn’t throw away the things they’re naturally curious about in order to focus 100% on going fast on the most important and urgent things, or they’ll risk losing both well-being and an important capacity for creating original concepts or combinations of concepts.
Failures can be execution-related as well as idea-related, so you shouldn’t update too heavily on someone failing at an approach or cause area similar to one you’re focused on. This is particularly true if you have a unique angle or intervention not covered by the sources deprioritizing a cause area.
by Simon Berens
The author paid $20/hr for someone to sit behind them 16 hours per day and do occasional chores. It tripled their productivity, at a cost of ~$88 per extra productive hour. They intend to keep experimenting, with some improvements eg. setting clearer expectations, and leaving time for reflection.
by Rob Bensinger
10 basics of rationalist discourse, from the author’s perspective:
- Truth-seeking (use good epistemics, and try to find the truth, not ‘win’).
- Non-violence (respond with counter-arguments, not doxxing, coercion or similar).
- Non-deception (never try to steer others to falser models of the world).
- Localizability (allow addressing specific claims without weighing in on larger context).
- Alternative-minding (consider alternative hypotheses and vantage points).
- Reality-minding (test claims, pre-register predictions, and don’t lose sight of object-level reality).
- Reducibility (use simple, concrete, precise language. Try to quantify your uncertainty).
- Purpose-minding (focus on the purpose of the conversation and the cruxes of that).
- Goodwill (reward others’ good epistemic conduct, forgive, and be civil).
- Experience-owning (own your own experiences, beliefs, and values, and state these).
by Matthew Barnett
The author noticed an error in Eliezer Yudkowsky’s book Inadequate Equilibria that undermines the key point that a layperson is sometimes able to spot large mistakes (eg. worth billions) that experts are not. Specifically, Yudkowky believed that the Bank of Japan should print more money. Several months later, under new leadership, it did. The book states that immediately after this Japan had real GPD growth of ~2.3% vs. a falling trend prior. However the post author identified that the real GDP had not been falling prior (at least post the fall of the Great Recession), and there was no discernible change in trend after the new leadership and policy.
Community & Media
Moving community discussion to a separate tab (a test we might run) by Lizka, Clifford and “Community” posts have their own section, subforums are closing, and more (Forum update February 2023) by Lizka, Sharang Phadke, Clifford
Author’s tl;dr: “We’re kicking off a test where “Community” posts don’t go on the Frontpage with other posts but have their own section below the fold. We’re also closing subforums and focusing on improving “core topic” pages to let people go deeper on specific sub-fields in EA.”
by Catherine Low, Anubhuti Oak, Łukasz Grabowski
The post authors are in the early stages of a project to better understand the experiences of women and minorities in EA. They are currently gathering and analyzing existing data, talking to others in the space, and planning next steps. If you have any data you’d like to share or are running a related project and would like to coordinate please get in touch at: email@example.com
Max Dalton is resigning as CEA’s Executive Director and transitioning to an advisory role. The role has changed substantially since November, and while happy with all CEA has achieved in the past 4 years, they’ve found it increasingly stressful and a worse personal fit.
by Henrik Karlsson
As communities grow, the ability to filter for quality declines, with memetic content often winning out against more complex thinking. This could be exacerbated by AI-created content and voting. A solution to this is redesigning karma such that posts you upvote have their authors added to your ‘trust graph’. Users they trust will also be added to your trust graph, more weakly. There is no global karma - all karma you see is weighted by who upvoted it, and how strongly they feature in your trust graph. This is currently being tested on SuperLinear Prizes, Apart Research, and a few other communities.
Two earthquakes of magnitude 7.8 and 7.7 occurred in Turkey, with at least 30,000 lives lost and more than 80,000 wounded. For those interested in donating, the EA community in Turkey shares several suggestions including Turkish Philanthropy Funds, AHBAP, and Turkey Mozaik Foundation. They’re also available to talk for anyone affected by the earthquakes at firstname.lastname@example.org.
The author argues that EA’s high tolerance for weirdness comes with benefits (you need weirdness to generate new ideas and insights), but also with an increased risk of creepy and inappropriate behavior. They suggest being marginally less accepting of weirdness overall, less universal in assumptions of good faith, and much less accepting of any intersection between romance and office / network.
Asks whether EVF should appoint new board members, considering two current members (Will MacAskill and Nick Beckstead) had significant enough ties to FTX to be recused from EVF FTX-related decision-making, two other board members are either funders or employees of EVF projects, and all current members are European or American.
After 10K chickens were killed in a fire a few weeks ago, an article noted that “no injuries were reported in the fire” - showing complete disregard for animal welfare. This post is a linkpost for the author’s short story inspired by this situation.
by Aaron Gertler
Linkpost for a short story by Lars Doucet, which explores the idea that we often reject ‘silver bullet’ solutions without giving them a fair chance.
The author shares a personal account of their direct and indirect interactions with SBF. They originally wrote it in mid-November and intended to post publicly, but realized many observations were second-hand and shared in confidence, and are posting now with some details blurred out after prompting from a coworker.
Author’s tl;dr: “My firsthand interactions with Sam were largely pleasant. Multiple of my friends had bad experiences with him, though. Some of them gave me warnings.
In one case, a friend warned me about Sam and I (foolishly) misunderstood the friend as arguing that Sam was pursuing ill ends, and weighed their evidence against other evidence that Sam was pursuing good ends, and wound up uncertain.
This was an error of reasoning. I had some impression that Sam had altruistic intent, and I had some second-hand reports that he was mean and untrustworthy in his pursuits. And instead of assembling this evidence to try to form a unified picture of the truth, I pit my evidence against itself, and settled on some middle-ground “I’m not sure if he’s a force for good or for ill”.
(And even if I hadn’t made this error, I don’t think I would’ve been able to change much, though I might have been able to change a little.)”
The author is mini-famous, and has been shocked by how often people write incorrect or warped narratives about them. Before getting famous they assumed this wouldn’t be the case if they were consistently kind, good, and charitable - but found that doesn’t hold at scale. They give specific examples from their own experience, as well as discussing trends and motivations for why this can happen.
Discusses the current state of polyamory in EA, resources for learning more, and suggestions for mitigating risks if you are poly. Key points include:
- Polyamory in EA is most frequent in the Bay Area, with smaller pockets in London and Oxford, less common in continental Europe, and quite rare in Global South communities.
- The author believes it’s likely not the right choice for at least 60% of people.
- Excluding people from your and your partners’ dating pools who you may or do work with is a common and useful practice.
- Don’t discuss relationship structures at professional or EA community events, unless the event is explicitly about a related topic and the conversation opt-in.
- When these conversations do happen, tackle them with nuance and without implying most EAs are poly or that EAs or rationalists or any specific person ‘should’ be poly.
- If you’re in a place where it’s active, using reciprocity.io (which is opt-in) can help avoid issues around unintentionally pressuring others.
by Jeff Kaufman
It’s reasonably common for nonprofits to publish their conflict of interest (COI) policies. The author suggests more EA organisations publicly share these, so concerned EAs can see what’s already in place, other organisations can reference them to help form their own policies, and people worried about a specific situation can see what policy should have been followed.
Bacteria (a form of prokaryote) have had ~4 billion years to evolve, but are still very simple - essentially DNA and DNA translation machinery. All multicellular life is eukaryotic, which is much more complex. The author states this is because prokaryotes have 4-5 orders of magnitude less DNA on average so simply can’t do as much stuff.
This occurred primarily because both types of cells need energy to power DNA reactions, but Prokaryotes generate this along their cell membrane (scaling sublinearly with size), while Eukaryotes do it via mitochondria inside the cells (scaling with volume). This and the larger populations of prokaryotes mean it has a strong selection effect where any DNA not immediately useful is jettisoned due to energy cost - eg. bacteria will often jettison DNA giving antibiotic resistance within hours of the antibiotic disappearing. Eukaryotes keep more “junk” DNA around, allowing time and space for useful changes to evolve. Over time this allowed modularity and regulatory elements like E. coli preferring glucose as an energy source, but switching to expressing genes which can digest lactose when glucose isn’t present. Prokaryotes' energy needs have created almost exclusively ‘exploit’ behavior (as opposed to exploration), which has stunted their growth over billions of years.
by Henrik Karlsson
The author skimmed 42 biographies of people who most Swedish people can recall as geniuses, to find patterns in their upbringing:
- >2/3rds were home-educated (often until age 12), and >95% were integrated with exceptional adults who took them seriously and invited them into serious discussions and meaningful work.
- ~95% had significant time on their own to roam, be bored, and explore their interests / self-teach. Their area of study that eventually made them famous was often something they became obsessed with while bored.
- ~70% were tutored 1-1 for more than 1 hour a day growing up.
- ~90% did a cognitive apprenticeship, with ~30% doing so before age 14.
They were also all exceptionally gifted at a young age.
Appreciation thread Feb 2023 by Michelle_Hutchinson (open thread)
A selection of posts that don’t meet the karma threshold, but seem important or undervalued.
Hardening pharmaceutical response to pandemics: concrete project seeks project lead
by Joel Becker, PaulB, SeLo
Governments expend significant resources to protect command and control, military response, and other capabilities against threats. The authors have the beginning of a plan to do the same for pharmaceutical response capability, and are looking for a collaborator to help drive it forward (express interest here).
A series of posts on mini-research projects conducted by Rethink Priorities in fall 2022, involving initial scoping and evaluation of ideas for scalable longtermist projects. This includes speedruns on developing an affordable super PPE, creating AI alignment prizes, and demonstrating the ability to rapidly scale food production in the case of nuclear winter.
by Jam Kraprayoon, Rethink Priorities
Rethink Priorities is considering creating a Longtermist incubator program, and is accepting expressions of interest for a project lead / co-lead to run the program if it’s launched. While there is currently no deadline, applications by 28th February are appreciated, to help inform planning efforts.
Since launch 4 months ago, the EA Good Governance Project has:
- Created a Trustee Directory with 60 individuals and a wide variety of skills. 28 organisations have signed up to view the directory.
- Developed guidance on a variety of governance topics, including a template for conducting a board assessment.
by Zach Stein-Perlman
Surveys show that many Americans are worried about and would support regulation on AI. For instance, Artificial Intelligence Use Prompts Concerns is a high-quality American public survey released last week by Monmouth, showing 55% of respondents think AI could eventually pose an existential threat (up from 44% in 2015), 55% favor “having a federal agency regulate the use of AI” and 60% have heard about AI products like ChatGPT that can have conversations with you.
The Unjournal organizes and funds public journal-independent feedback, rating, and evaluation of hosted papers. It focuses on quantitative work that informs global priorities. The first evaluation is up now, with two more to be released soon, and ~10 in the evaluation pipeline.
The author categorizes nuclear risk reduction interventions as ‘left of boom’ (before a nuclear strike eg. prevention) or ‘right of boom’ (after a nuclear strike eg. response, resilience). They analyzed all grants in the subject area “Nuclear Issues” of the Peace and Security Funding Index, and identified any that could be considered “right of boom” - finding these receive at most one-thirtieth of total funding in the nuclear field (as an upper bound). They explore possible reasons for this neglectedness, and conclude that attention and political preferences play a role.