This is part of a weekly series summarizing the top posts on the EA and LW forums - you can see the full collection here. The first post includes some details on purpose and methodology. Feedback, thoughts, and corrections are welcomed.
If you'd like to receive these summaries via email, you can subscribe here.
Podcast version: Subscribe on your favorite podcast app by searching for 'EA Forum Podcast (Summaries)'. A big thanks to Coleman Snell for producing these!
Philosophy and Methodologies
by Max Reddel
The author explains how policy researchers can support decision-making with simulation models of socio-technical systems, even under deep uncertainty.
They first suggest systems modeling (eg. agent-based models). For example, agent-based modeling was used here to simulate how different individuals with different characteristics (age, health status, social network) might behave during an epidemic, and how that would affect spread and the relative effectiveness of different interventions.
However, many political decisions have even less certainty. ‘Deep uncertainty’ is uncertainty on the system model, the probability distributions over inputs to them, and which consequences to consider and their relative importance. In this scenario, computational modeling can be helpful to explore the implications of different assumptions about uncertain / contested / unknown model parameters and mechanisms. The aim is to minimize plausible future regret via finding vulnerable scenarios and policy solutions that are robust to them - instead of predicting expected effect. They provide several techniques and examples for this.
Object Level Interventions / Reviews
The author argues there’s room for reasonably-large boosts to alignment research from “weak” cognitive tools like Google search. The problem is that the majority of people looking at this intervention don’t have experience with the hard parts of alignment research, which would help them understand the needs and direct tools toward the most helpful elements. They suggest those interested do some object-level alignment research, then pick a few people they want to speed up and iteratively build something for them specifically.
They also provide a few early ideas of what such research tools might look like eg. a tool which produces examples, visuals or stories to explain inputted mathematical equations, or a tool which predicts engagement level and common objections to a given piece of text.
Anthropic: Core Views on AI Safety: When, Why, What, and How by jonmenaster and Anthropic's Core Views on AI Safety by Zac Hatfield-Dodds
Linkpost to Anthropic’s post here, which discusses why they anticipate rapid AI progress and impacts, how this led them to be concerned about AI safety, and their approach to AI safety research.
Key points include:
- Research on scaling laws demonstrates more computation leads to improvements in capabilities. Via extrapolation, we should expect great leaps in AI capabilities and impact.
- No-one knows how to achieve powerful, helpful, honest, harmless AI - but rapid AI progress may lead to competitive racing and deployment of untrustworthy systems. This could lead to catastrophic risk from AI systems strategically pursuing dangerous goals, or making innocent mistakes in high-stakes situations.
- They’re most excited about and pursuing research into scaling supervision, mechanistic interpretability, process-oriented learning, understanding and evaluating how AI systems learn and generalize, testing for dangerous failure modes, and societal impacts and evaluations.
- Their goal is to differentially accelerate safety work, and have it cover a wide range of scenarios, from those where safety challenges turn out to be easy to address to those where they turn out to be very difficult. In the very difficult scenario, they see their role as to sound the alarm and potentially channel collective resources into temporarily halting AI progress. However, they aren’t sure what scenario we’re in and hope to learn this.
by Vika, Rohin Shah
Linkpost for some slides on AI threat models and their alignment plans by DeepMind’s alignment team. It doesn’t represent / isn’t endorsed by DeepMind as a whole.
Key points include:
- They believe the most likely source of AI x-risk is a mix of specification gaming and goal misgeneralization, leading to a misaligned and power-seeking consequentialist that becomes deceptive. (SG + GMG -> MAPS)
- Their approach is broadly to build inner and outer aligned models, and detect models with dangerous properties.
- Current research streams include process-based feedback, red-teaming, capability evaluations, mechanistic interpretability, goal misgeneralization understanding, causal alignment, internal outreach, and institutional engagement.
- Comparative to OpenAI they focus more on mechanistic interpretability and capability evaluations, as well as using AI tools for alignment research. Scalable oversight is a focus for both labs.
Summary of this paper by Alexia Georgiadis from the Existential Risk Observatory. The study involved surveying 500 members of the American and Dutch public on the likelihood of human extinction from AI and other causes, before and after showing them specific news articles and videos. Key results:
- Depending on media used, 26% to 64% of participants rated AI higher on a ranked list of events that may cause human extinction after reading / watching. This effect may degrade over time.
- This CNN video featuring Stephen Hawking was the most effective at increasing the above ratings, of 10 articles and videos tested.
- Widespread endorsement of government participation in AI regulation was found among individuals with heightened awareness (ie. high rankings / ratings) of AI risks.
by AI Impacts
Results of a 2022 survey asking ML researchers how they would divide probability over the future impacts of high-level machine intelligence, across 5 buckets. Average results were:
- Extremely good (eg. rapid growth in human flourishing): 24%
- On balance good: 26%
- More or less neutral: 18%
- On balance bad: 17%
- Extremely bad (eg. human extinction): 14%
Linkpost for the author’s blog post, which argues against reductions of large language models (LLMs) that are sometimes used to imply they can’t generalize much further. For instance, saying LLMs are ‘just’ pattern-matchers, or ‘just’ massive look-up tables. While there is some truth to those statements, there’s empirical evidence that LLMs can learn general algorithms and contain and use representations of the world, even when only trained on next-token prediction. We don’t know what capabilities can or cannot arise from this training and should be cautious about predicting its limits.
Argues that outsourcing alignment research to AI is like the comedy sketch ‘The Expert’ ie. if we don’t understand the thing we’re asking for, we can’t expect a good result. The best case is the AI figures out what we need and does it anyway. They argue it's more likely it gives us something that only looks right, or attempts to fulfill our preferences but understands them wrong, or isn’t capable of alignment research at all the way we’ve prompted it. And because the core of these issues is that we don’t realize something has gone wrong, we’re unable to iterate. Though a top comment by Jonathan Paulson notes it’s often easier to evaluate work than do it yourself.
The best solution to this issue is for us to develop more object-level alignment expertise ourselves, so we’re better able to outsource (ie. direct and understand) the further alignment research.
Other Existential Risks (eg. Bio, Nuclear, Multiple)
by JorgeTorresC, Jaime Sevilla, JuanGarcia, Mónica Ulloa, Claudette Salinas, Roberto Tinoco, daniela tiznado
Linkpost for this announcement of the US Global Catastrophic Risk Management Act.
The law orders the United States government to establish actions for prevention, preparation, and resilience in the face of catastrophic risks - including presenting risk assessments and recommendations to congress. The recognized risks include: global pandemics, nuclear war, asteroid and comet impacts, supervolcanoes, sudden and severe changes in climate, and threats arising from the use and development of emerging technologies (such as artificial intelligence or engineered pandemics).
Global Health and Development
GiveDirectly shares study results and testimonials from recipients in Malawi on how direct cash empowers them. 62% of GiveDirectly’s recipients are women. Cash transfers can increase use of health facilities, improve birth weight and infant mortality, reduce incidents of physical abuse by a male partner of a woman, increase girls’ school attendance, increase a women’s likelihood of being the sole or joint decision-maker, increase entrepreneurship, increase savings, and reduce the likelihood of illness.
by Kelsey Piper
- Without a solid theoretical understanding of a problem, empirical solutions are difficult - people tried many variants with important parameters off, and didn’t know how to correct them.
- The simplicity of today’s solution, and the availability of the required ingredients, is due to continual research and design efforts to get to that point.
Other / Multiple
by Ula Zarosa, CE
Charity Entrepreneurship has helped to kick-start 23 impact-focused nonprofits in four years. They estimate 40% of these reach or exceed the cost-effectiveness of the strongest charities in their fields (eg. GiveWell / ACE recommended).
Their seed network has provided $1.88 million in launch grants to date. The launched charities have then fundraised over $22.5 million from other grantmakers. This has provided the following impacts, among others:
- ~14,000 additional children vaccinated (by Suvita)
- ~1.14 million fish and ~1.4 million shrimp potentially helped (with potential to reach >2.5 billion shrimp per annum) (Fish Welfare Initiative and Shrimp Welfare Project)
- ~250,000 new contraceptive users from a single campaign (Family Empowerment Media)
- ~215,000 children with reduced lead exposure (Lead Exposure Elimination Project)
- Breakthrough papers on subjective well-being (Happier Lives Institute)
You can apply to their program here.
by Jason Schukraft, Peter Favaloro
Open Philanthropy announces the AI Worldviews Contest, with the aim to surface novel considerations that could influence their views on AI timelines and AI risk. Essays should address one of the following:
- What is the probability that AGI is developed by January 1, 2043?
- Conditional on AGI being developed by 2070, what is the probability that humanity will suffer an existential catastrophe due to loss of control over an AGI system?
$225K in prizes will be distributed across six winning entries. Work posted for the first time on or after September 23rd 2022, and up until May 21st 2023, is eligible. See the post for details on eligibility, submission, judging process and judging criteria.
by Jason Clinton, Wim van der Schoot
EA needs more skilled infosec folk. EA-aligned software engineers interested in becoming security engineering focused or accelerating their existing infosec career paths can sign up for the book club here. It will involve facilitated fortnightly discussions starting April 1st 2pm PDT, by the lead of the Chrome Infrastructure Security team at Google. They’ve used the book (available for free here) as part of successfully transitioning engineers into security previously.
by CE, weeatquince
In 2023 Charity Entrepreneurship will be researching two new cause areas: mass media interventions and preventative animal advocacy. They’re looking for submissions of ideas in these areas, which may lead to a new charity around that idea being launched.
Community & Media
Linkpost and summary for 80,000 Hours review of their programmes in 2021 and 2022.
They’ve seen 2-3x higher engagement for 2022 vs. 2020 in 3 out of 4 main programmes: podcast listening time, job board vacancy clicks, and number of 1-1 calls. The fourth, web engagement, fell 20% in 2021 and rose 38% in 2021 after marketing investment.
The team grew by 78% to 25 FTEs and Howie Lempel became the new CEO. 2023 focuses include improving quality of advice (partially via hiring a senior research role), growing the team ~45%, continuing to support the four main programmes and experimenting with new ones such as headhunting and an additional podcast.
EA is highly decentralized, with a small set of organisations / projects with 50+ people, some with 5-20, and most with 1-2. This means many people lack organizational support like a manager, colleagues to bounce ideas off, ops support, and stable income. As a movement, it pushes away people with lower risk tolerance, leads to duplication of operations / administrative work, and to worse governance. The author suggests:
- Organisations with good operations and governance support other projects eg. what is done by Rethink Priorities’ Special Projects Program.
- Programs mainly aimed at giving money to individuals be converted into internal programs eg. like Charity Entprenuership’s incubation programme, or the Research Scholars Program.
- A top comment by Hauke Hillebrandt also suggests exploring mergers and acquisitions.
by Deena Englander
There are easy steps even small EA orgs should take to substantially reduce personal liability:
1. Incorporate (LLCs are easy and inexpensive to start).
2. Get your organization its own bank account.
3. Get general liability insurance (for the author it costs ~$1.3K per year, but even one lawsuit can bankrupt you otherwise).
The Authentic Revolution, an organization that runs facilitated experiences like workshops, has a policy that: “for three months after a retreat, and for one month after an evening event, facilitators are prohibited from engaging romantically, or even hinting at engaging romantically, with attendees. The only exception is when a particular attendee and the facilitator already dated beforehand.” The author suggests EA community builders should consider something similar, and suggests ways of adapting it to different settings.
by Amy Labenz, Angelina Li, Eli_Nathan
Results from initial analysis by CEA on how people of different genders and racial backgrounds experienced EAG events in 2022 (including EAGx). Key results:
- 33% of attendees, 35% of applicants, and 43% of speakers / MCs self-reported as female or non-binary.
- 33% of attendees, 38% of applicants and 28% of speakers / MCs self-reported as people of color.
- Welcomingness and likelihood to recommend survey scores were very similar for women and POC to overall scores (with a small decline of 0.1 on a 5-point scale for welcomingness for women).
Asks the community to be proactive in addressing sexual misconduct and the dynamics that influence it, so the burden of pushing forward change doesn’t fall on survivors. Often a survivor will engage in a process that hurts them and takes considerable time / effort (eg. repeated explanations of trauma, arduous justice processes) in order to make it less likely the perpetrator does something similar again. The author advocates an alternative of allocating collective effort to creating protective norms, practices, and processes to take care of those affected and encourage the behaviors we want in the future. Input from survivors is still necessary and important, but ask for it with care and gentleness, and spend significant time thinking and acting on what you hear.
The author shares reflections on their friend Alexa’s life, who acted on compassion fearlessly and consistently to aid an incredible number of people and animals in their 25-year life.
“The challenges that the world faces are vast, and, frequently, overwhelming. The number of lives on the line is hard to count. Sometimes, it all feels like a blur - abstract, and so very far away. And because of that vastness, we can easily lose track of the magic and power of one life, one person’s world.
Alexa walked into that vastness, arms outstretched, and said, “I can help you. And you. And you. I can’t help you all. But I will try.””
The author argues that there is evidence that aggregating the estimates of many produces a more accurate estimate as the number grows. They suggest this means for most practical sociological questions, you should assume the conventional answer is correct. In practice, this means most proposals for new norms of relating to others, organizational structures etc. should be rejected out of hand if they a) don’t have an airtight first-principles argument and b) don’t match conventional wisdom.
This also means putting weight in existing methods such as peer review, which is highly respected as a method of collaborative truth-seeking. They suggest more people should publish in journals vs. on the forum. Similarly they suggest putting more weight on experience, like society at large does. They guess that hiring highly experienced staff would have prevented fraud at FTX.
For individual EAs, they suggest deferring on technical topics to those with conventional markers of expertise (unless you are yourself an expert), and considering how you can do the most good in conventional professions for changing the world (eg. professor, politician, lobbyist).
Linkpost for this article by Bloomberg, which discusses reported cases of sexual misconduct and abuse, and possible contributing factors in terms of rationality community culture.
Suggests Nick Bostrom should step down from Director to Senior Research Fellow at FHI, due to:
- Management concerns at FHI eg. large turnover, and a freeze on hiring.
- Lack of tact in Bostrom’s apology for an old email post.
- Effects of the above on relationships with the university, staff, funders, and collaborators.
by Said Achmiz
To accomplish a goal requiring certain behaviors, a social system can use any combination of three methods. In brackets are examples (if your goal was building a successful organization):
- Selective methods - build the system out of only people who will do those behaviors (eg. hire people with good skills).
- Corrective methods - apply methods to alter their behavior (eg. training).
- Structural methods - build the system so it works if people behave in the ways you expect they will (eg. performance incentives, technological improvements to reduce skill requirements).
The author also provides examples for assembling a raid guild in World of Warcraft, and for ensuring good governance.
A selection of posts that don’t meet the karma threshold, but seem important or undervalued.
Linkpost to this paper and demo by Google. PaLM-E is a large model trained on multiple modalities (internet-scale language, vision, and visual-language domains). It exhibits positive transfer (ie. learning in one domain helps in another) and the connected robot is able to successfully plan and execute tasks like “bring me the rice chips from the drawer” or “push green blocks to the turtle” even when it has never seen a turtle before in real life.
by James Odene [User-Friendly]
In industry, it’s common to dedicate ~60% of marketing budget to long-term marketing (eg. building brand), and ~40% to short-term activations (encouraging specific actions now). Research by Binet and Field supports a similar split. User-Friendly has noticed far less is spent on long-term marketing in EA. They guess this may be because of the focus on ROI, which is easiest to measure when implementing things with a clear call to action (eg. to donate), and measuring how many take it. However, the success of that call to action will also depend on the audience’s pre-existing awareness and perception of the organization and concepts behind it - that’s what long-term marketing aims to build over time.
They suggest measuring earlier parts of the funnel as well as later ones (eg. via ‘have you heard of this brand?' surveys), and depending on results re-allocating funding across the ‘build’ / ‘nudge’ / ‘connect’ stages of the marketing funnel.
by US Policy Careers
Comprehensive guide on congressional internships, written by an undergraduate in the process of applying, and reviewed by several individuals with experience “on the Hill”. Includes an overview of how they work, why to apply, considerations of where to apply, and preparation / tips for the application process.
by Kyle Smith
The author has a large dataset of electronically filed 990-PFs (which reports charitable activities from private foundations). They suggest slicing this data by aspects like whether a foundation gives to international charities or how many charities it gives to, in order to create a list of those most likely to redirect funds if given targeted advice on effective charities.
In 2022, QURI announced the Squiggle Experimentation Challenge and a $5k challenge to quantify the impact of 80,000 hours’ top career paths. The winners were: