Supported by Rethink Priorities
This is part of a weekly series - you can see the full collection here. The first post includes some details on purpose and methodology.
If you'd like to receive these summaries via email, you can subscribe here.
New - podcast version: prefer your summaries in podcast form? A big thanks to Coleman Snell who is now producing these! The first episode will be up on the EA Forum Podcast later this week - subscribe to hear as soon as it's up.
Top Readings / Curated
Designed for those without the time to read all the summaries. Everything here is also within the relevant sections later on so feel free to skip if you’re planning to read it all.
by EV Ops
CEA Ops started supporting other orgs over time, now supporting 10+. Because of this they’ve split into their own organization called EV Ops. They’re close to capacity, but fill in the EOI form if you’d like support (1-2 person start-ups through to large orgs). Or help expand capacity by applying to join the team.
[Linkpost] A survey on over 300 works about interpretability in deep networks
by scasper
Linkpost for survey of 300 works on inner interpretability, across areas including weights, neurons, subnetworks, and latent representations. A discussion section includes findings such as:
- Promising paradigms outside of circuits deserve more attention.
- Interpretability research has strong links to other areas such as adversarial robustness.
- We need a benchmark for evaluating interpretability tools, based on their ability to help us in goals such as designing novel adversaries or discovering system flaws.
- It’s important not to focus on best-case performance.
Improving "Improving Institutional Decision-Making": A brief history of IIDM
by Sophia Brown
History of IIDM - Pre-2019
- The global health area was advocating improving government aid spend via lobbying.
- Many critiqued EA’s seeming focus on marginal vs. systemic change. Leaders responded eg. Open Phil now funds areas like immigration reform and housing.
- The rationality community was developing tools for individual decision-making. Evidence for effectiveness of forecasting in group use pushed forward the institutional angle.
In 2015, the cause area formed out of the third bullet point.
Current State IIDM - 2019 to present
- The Effective Institutions Project built a broader theory of IIDM, including a focus on value alignment within powerful institutions, and published a prioritized list of target organisations.
- Other orgs created prediction markets and platforms, mainly targeted at internal EA decision-making.
- Work to have institutions adopt better decision-making is limited and mainly political. Eg. Center for Election Science got approval voting adopted in several cities, and there is ongoing work on US policy (eg. immigration reform, pandemic preparedness) and key institutions such as the UN.
- Funders are concerned about downside risks and whether the area is impactful. Several welcome applications for specific subsets of IIDM (eg. forecasting). Ben Todd estimated <100 EAs working on the area in 2020.
Future State IIDM
The author intends to publish a series of posts exploring under-appreciated ways of improving institutions, how to structure IIDM work (eg. should it be cause-specific?), and measuring success and downside risks.
EA Forum
Philosophy and Methodologies
Could it be a (bad) lock-in to replace factory farming with alternative protein?
by Fai
If we replace factory farming with alt proteins, we lose the chance to get rid of it for moral reasons, and this might stunt our moral development in regards to animals long-term.
The author changed their mind based on comments, to conclude we should speed up alt protein development, and switch to moral advocacy once a majority of people are veg*n and have less cognitive dissonance about it.
Object Level Interventions / Reviews
An experiment eliciting relative estimates for Open Philanthropy’s 2018 AI safety grants
by NunoSempere
Survey of 6 researchers on 9 large (100K+) AI safety grants by OP. Author elicited distributions of relative value of grants using pair-wise and hierarchical comparisons in Squiggle, before and after group discussion.
The aggregate highest score grant was for ‘Oxford University - Research on the Global Politics of AI’. After discussion, there was more agreement generally. Many researchers assigned nearly all impact to one case, and almost no impact to several cases.
The author estimates the survey took 20-40 hours (including participant time), which could be worth scaling up for more grant evaluation. There are also methodology issues to resolve if so eg. pair-wise and hierarchical distribution estimates from the same researcher were often inconsistent (90% CIs didn’t overlap).
Differential technology development: preprint on the concept
by Hamish_Hobbs, jbs, Allan Dafoe
Linked pre-print of a paper describing differential technology development and possible implementations. The authors would love feedback!
This concept can look like preferentially advancing risk-reducing tech over risk-increasing tech. Risk-reducing can include safety (eg. cybersecurity, makes other tech safer directly), defensive (eg. vaccines, decreases risk from other new techs), and substitute (eg. windpower, offers similar benefits as a risk-increasing tech, more safely).
Anticipating or identifying technological impacts can help inform government and philanthropic funders on funding priorities and regulation, and is particularly applicable to synthetic biology and AI.
Announcing the Space Futures Initiative
by Carson Ezell, Madeleine Chang, olafwillner
Newly launched, focused on improving the long-term future in outer space for humanity and our descendants. Expects to focus on sharing longtermist ideas within the space community and developing scalable space governance frameworks due to neglectedness. Looking for research proposals, collaborators, mentors for student projects, and EA group leaders interested in hosting discussion groups.
Requesting feedback: proposal for a new nonprofit with potential to become highly effective
by Marshall
Suggests online health worker training as an effective global health intervention. Estimates $59 per DALY averted by training workers in newborn care, based on existing research. (One example - intervention suggested is broader). Proper training can also increase the chance of positive critical decisions eg. choosing to quarantine a patient, preventing an outbreak. Gaps are significant eg. only 25% of LICs had ongoing infection control training.
The author ran a pilot since April 2021, and found in a follow-up pilot completion rates reached 10x higher than those of free, optional, self-paced courses taken in LMICs. They are looking for $4M funding over 3 years to scale up, and would like feedback to improve their proposal first.
What could an AI-caused existential catastrophe actually look like?
by Benjamin Hilton, 80000_Hours
An AI has many paths to power (social, political, financial or otherwise). Eg. hacking systems, persuading humans, manipulating discourse, developing new technologies, making threats.
An AI could come to be power-seeking in various ways. For instance, building an AI aiming for proxy goals - things slightly different than what we care about, but easier to describe / measure. For instance, we could ask an AI to reduce law enforcement complaints, and it might tackle it via suppressing complaints instead of improving practices. Another pathway is an AI deployed with the capability to improve itself, and access to info to do so eg. the internet. It may decide its goals are easier to achieve without humans, build capacity, and kill us all (eg. via developing a chemical weapon).
Improving "Improving Institutional Decision-Making": A brief history of IIDM
by Sophia Brown
History of IIDM - Pre-2019
- The global health area was advocating improving government aid spend via lobbying.
- Many critiqued EA’s seeming focus on marginal vs. systemic change. Leaders responded eg. Open Phil now funds areas like immigration reform and housing.
- The rationality community was developing tools for individual decision-making. Evidence for effectiveness of forecasting in group use pushed forward the institutional angle.
In 2015, the cause area formed out of the third thread.
Current State IIDM - 2019 to present
- The Effective Institutions Project built a broader theory of IIDM, including a focus on value alignment within powerful institutions. It published a prioritized list of target institutions.
- Other orgs created prediction markets and platforms, mainly targeted at internal EA decision-making.
- Work to have institutions adopt better decision-making is limited and mainly political. Eg. Center for Election Science got approval voting adopted in several cities, and there is ongoing work on US policy (eg. immigration reform, pandemic preparedness) and key institutions such as the UN.
- Funders are concerned about downside risks and whether the area is impactful. Several welcome applications for specific subsets of IIDM (eg. forecasting). Ben Todd estimated <100 EAs working on the area in 2020.
Future State IIDM
The author intends to publish a series of posts exploring under-appreciated ways of improving institutions, how to structure IIDM work (eg. should it be cause-specific?), and measuring success and downside risks.
Roodman's Thoughts on Biological Anchors
by lukeprog
Linkpost for David Rooman’s Apr 2020 review of Ajeya’s ‘Forecasting TAI with biological anchors’ report.
The main critique is that Ajeya’s report considers several frameworks, which are contradictory in some cases. Each has its own probability distribution for TAI timelines and they are combined via pointwise averaging. David argues it may be better to use Bayesian reasoning to help land on a single ‘most plausible’ framework (which could contain elements of multiple). He explores one possibility, which contains both parameters and hyperparameters for the model training process.
The Pugwash Conferences and the Anti-Ballistic Missile Treaty as a case study of Track II diplomacy
by rani_martin
Track II (unofficial) diplomacy can reduce great power tensions via providing avenues to share policy ideas, and building trust via information sharing. These each increase the chance of cooperative policies being passed.
A case study is the Pugwash conference on nuclear risk, which convened scientists, experts and policy makers. American scientists were concerned about destabilizing effects of ballistic missiles, convinced soviet scientists of this concern at the conferences, who communicated back to Soviet Union political leaders. The US and Soviet Union then signed a treaty against ballistic missiles in 1972. Secrecy means this narrative isn’t 100% verified, but seems likely given available evidence (eg. the scientists did have political connections they could have spoken to, timing lines up). This provides support for Track II diplomacy affecting countries’ policy preferences.
'Artificial Intelligence Governance under Change' (PhD dissertation)
by MMMaas
Author’s tl;dr: “this dissertation discusses approaches and choices in regime design for the global governance of advanced (and transformative) AI. To do so, it draws on concepts and frameworks from the fields of technology regulation (‘sociotechnical change’; ‘governance disruption’), international law, and global governance studies (‘regime complexity’).”
Includes discussion of how AI systems might be used within international law (eg. for monitoring or enforcement), and implications for AI governance of how the architecture of global governance has changed in the past two decades.
Opportunities
Apply now - EAGxVirtual (21-23 Oct)
by Alex Berezhnoi
Free event, applications due 17th Oct. Open to all familiar with core EA ideas.
Announcing an Empirical AI Safety Program
by Josh1
Introduction to ML Safety program by Center for AI Safety running Sep 26 - Nov 8. Virtual course introducing those with a deep learning background to empirical AI safety research. 5 hours per week, advanced and introductory streams, $1K stipend available.
Participants apply by Sep 21. Facilitators apply by Sep 18 (background in deep learning and AI safety required. ~2 hours per week, pay $30 per hour).
Rational Animations' Script Writing Contest
by Writer
Script writing contest for the Rational Animations Youtube channel. Any longtermist topic, max 2.5K words, prize of 5K USD + script will be professionally animated and posted with due credit + potential job offer as scriptwriter. There will be 0-4+ winners. Deadline Oct 15th.
Community & Media
My emotional reaction to the current funding situation
by Sam Brown
The author previously worked for a climate change startup, and lived very frugally. After learning about EA, he got funding for AI alignment research. He initially felt negative emotions about this, because it made the prior frugality and sacrifice seem irrelevant when £40K was gotten so easily. The funding does make him feel safe and unpressured in his research, but also worry it’s not valuable enough, and notice other symbols of wealth in the community.
My closing talk at EAGxSingapore
by Dion
Asks everyone to show hands if they’d want to help someone at the conference - nearly all hands raised.
EA is inspiring, but demands big steps and this can cause anxiety and fear. There is hope and warmth among us that helps with this. When you leave an event like EAGx, particularly if you don’t live in a hub, you might feel frustrated and alone because you don’t feel that community supporting you anymore - but remember how many people want to help you, that you’re worth it, and reach out for a call or to chat about next steps.
Bring legal cases to me, I will donate 100% of my cut of the fee to a charity of your choice
by Brad West
Lawyer practicing near Chicago. Can help on personal injury, worker’s compensation, bankruptcy, divorce cases or commercial disputes. Will donate his cut of the firm’s fee (~1/9th to 1/3rd of total cost) to a charity of your choice.
Suggests other ETG professionals could also use the fact they will donate their fee to generate more business from EAs and therefore donations. If you do, let the author know and he’ll list you on the Consumer Power Initiative website.
by Evie Cottrell
Agency means acting intentionally to fulfill your goals. However, the word can be associated with ambition, hustling, liberally asking for things, and willingness to violate norms, be uncooperative with authority, or use many resources to achieve your aims. These behaviors are useful in moderation, but can go too far if the community confers status and praise for them because they are ‘agentic’. Instead of using these behaviors for their own sake, decide for yourself what will help you achieve your goals, and do that.
by EV Ops
CEA Ops started supporting other orgs over time, now supporting 10+. Because of this they’ve split into their own organization called EV Ops. They’re close to capacity, but fill in the EOI form if you’d like support (1-2 person start-ups through to large orgs). Or help expand capacity by applying to join the team.
EA architect: Building an EA hub in the Great Lakes region, Ohio
by Tereza_Flidrova
Sandusky, Ohio could be a good place for an EA hub because it has a low cost of living (median home price ~200K), a lake and lots of parks, welcoming community vibes, <1hr to an airport, and a walkable town center with good amenities.
The author proposes to create an EA hub, initially centered around residents Andy and Christine’s areas of expertise (nuclear and bio threats). Conferences, retreats, fellowships, tenancies and coworking spaces to come. Looking to talk to those living in or interested in Ohio, and anyone running an EA hub or fellowship.
It’s not effective to call everything effective and how (not) to name a new organisation
by James Odene [User-Friendly]
A good organization name should:
- Value distinctiveness over descriptiveness
- The Von Restorff effect biases us toward things that stand out
- ‘Effective’ and ‘impact’ are overused in EA - so don’t use them!
- Check if your name is cheap on Google Ads - if not, it’s overcrowded (and could put you in competition with other EA orgs for mindspace and ad space)
- Avoid acronyms - they’re non-distinctive and can drop your SEO rank
- Secondary naming considerations
- Use the right ‘tone’ for your audience
- Long names will be abbreviated
- Consider prestige and credibility
The authors are open to chat further, and will be at Berlin and DC EAGs.
We’re still (extremely) funding constrained (but don’t let fear of getting funding stop you trying).
by Luke Freeman
Available funding is an order of magnitude more than a decade ago, and we don’t want lack of funds to limit ambition. However, we’re still extremely funding constrained in many areas - there are gaps even for promising megaprojects and top Givewell charities. Posts without that nuance can be damaging. Donating is accessible and valuable.
Many therapy schools work with inner multiplicity (not just IFS)
by David_Althaus, Ewelina_Tur
If you’re seeking therapy, Internal Family Systems (IFS), Compassion-focused therapy, Schema therapy, and Chairwork all acknowledge the psyche has multiple ‘parts’ and treat each with empathy. This is popular in EA and rationalist communities. In contrast, a criticism of CBT is that it privileges the analytical parts over emotion, and this can backfire. More detail on each method is given to help you decide which is right for you.
"Doing Good Best" isn't the EA ideal
by Davidmanheim
The author argues the EA community has attempted to maximize too much, leading to a narrow set of suggested actions. We will do more good by focusing on ‘better’ instead of ‘best’, increasing the ratio of explore:exploit, and expanding the pathways we encourage (particularly low-commitment ones).
This is because:
- The ‘maximum impact’ action for an individual might not be best for the whole community (particularly if everyone takes it).
- We might be wrong about the top priorities / opportunities, and given lag effects with career or education, this means we could be missing important skills in the future.
- Several fields are early days, we don’t know exactly what we’ll need for them, which makes optimization premature.
- More pathways promotes growth and scaling.
Didn’t Summarize
Does beating yourself up about being unproductive accomplish anything? What should I ask self-compassion researcher Kristin Neff when I interview her? by Robert_Wiblin
Altruist Dreams - a collaborative art piece from EAG SF by samstowers
Forecasting thread: How does AI risk level vary based on timelines? by elifland (summarized in LW section)
Democratising risk: a community misled by Throwaway151
LW Forum
AI Impacts / New Capabilities
ACT-1: Transformer for Actions
by Daniel Kokotajlo
Linkpost for https://www.adept.ai/act Adept is aiming to create models that can take digital actions based on high level requests like ‘find me a good 4-bed house in Houston’. ACT-1 is a first attempt and trained in many digital tools, in addition to having full browser access. They have a focus on human feedback as their main way to combat risks.
Daniel notes this is currently at the level of WebGPT - not surprising capability-wise.
AI Meta & Methodologies
[Linkpost] A survey on over 300 works about interpretability in deep networks
by scasper
Linkpost for survey of 300 works on inner interpretability, across areas including weights, neurons, subnetworks, and latent representations. A discussion section includes findings such as:
- Promising paradigms outside of circuits deserve more attention.
- Interpretability research has strong links to other areas such as adversarial robustness.
- We need a benchmark for evaluating interpretability tools, based on their ability to help us in goals such as designing novel adversaries or discovering system flaws.
- It’s important not to focus on best-case performance.
AI Risk Intro 1: Advanced AI Might Be Very Bad
by TheMcDouglas, LRudL
An intro written for the general public, with no assumed knowledge. Covers the history of ML models, current capabilities and pace, inner and outer alignment, and failure modes. It’s quite comprehensive so even a summary was too long for this post - but if you’re keen, you can find one here.
Forecasting thread: How does AI risk level vary based on timelines?
by elifland
Poll on reader’s forecasts of chance of AI-caused existential catastrophe, conditional on AGI arriving at given times (ranging from by 2025 to after 2060). So far risk estimates are significantly higher for short timelines, with many forecasts of >90% for the shortest timeline and no forecasts of >75% for the longest.
Takeaways from our robust injury classifier project [Redwood Research]
by DMZ
Redwood’s experiment aimed to make an adversarially robust system that, given an input sentence to complete, never completed it with violent / injury-related content. This was not achieved. The authors suggest improvements could be made via adversarial training on really bad vs. slightly bad cases, more training rounds on different types of adversarial attacks, and changing the task so it can’t be accidentally failed (eg. by a model that doesn’t realize a word is violent). The team doesn’t believe this experiment’s result is strong evidence against this method because there are many potential improvements.
AGI safety researchers should focus (only/mostly) on deceptive alignment
by Marius Hobbhahn
Many (most?) people in alignment believe deceptive alignment is necessary or greatly increases the harm of bad AI scenarios. There is even disagreement on if a sufficiently powerful misaligned AI is automatically deceptive.
Deceptive alignment is a factor in nearly all short-timeline x-risk scenarios, makes the upper bound of harm higher, and is more likely to be neglected vs. corrigible alignment (giving the right goal) because the latter is useful for industry to work on even without an x-risk lens. Because of this, the author argues for deceptive alignment to be prioritized and its links to other research always considered.
Understanding Conjecture: Notes from Connor Leahy interview
by Akash
Conjecture is a new AI alignment organization. Akash listens to a podcast interview with the co-founder Connor Leahy, and summarizes key learnings. These are quite extensive but include that Conjecture works on short timelines (20-30% in next 5 years), they’ve started an incubator (Refine) for alignment researchers with non-conventional ideas, are funding constrained, and believe the community needs more diversity in approaches to AI alignment.
Not AI Related
Dan Luu on Futurist Predictions
by RobertM
Linkpost to Dan Luu’s piece analyzing the track record of 11 futurists from Wikipedia’s list of well-known futurists. The purpose of the post is to see which methods are most reliable.
6 futurists had too few resolved or too vague predictions to score. Of those remaining, high accuracy (50-100%) was found for 2 futurists who either had deep domain knowledge and looked at what’s trending, or gave very conservative predictions and took the “other side” of bad bets others made. Low accuracy (3-10%) was found for 3 futurists who relied on exponential progress continuing and panacea thinking.
Didn’t Summarize
Alignment via prosocial brain algorithms by Cameron Berg
This Week on Twitter
AI
You can exploit GPT-3 prompts with malicious inputs to ignore previous directions, or repeat original directions back - kind of like SQL injections. (link) (link)
GPT-3 can be used to control a browser with the right prompt. (similar to Adept AI’s aim - see LW summary section) (tweet)
Character.ai has opened up a public beta that lets you talk to language models and (I think?) customize them somewhat. (tweet)
The NIST (national institute of standards and technology - US) has published a second draft of their AI risk management framework. It’s open for feedback until 29th Sept, final publication in Jan 2023. (tweet) (govt. page)
Anthropic investigated superposition - where one neuron in a net is used to capture several unrelated concepts, a problem for interpretability. Arises naturally in some non-linear models, and seems correlated to vulnerability to adversarial examples. (paper) (tweet)
EA
FHI launching a $3M grant program for research into the humanitarian impacts of nuclear war. (tweet) (program link)
Forecasting
Superforecasters identify key early warning indicators for climate change threats such as food instability and international solidarity vs. conflict. (tweet) (paper)
National Security
CSET releases a report on China’s self-identified most vexing technological import dependencies. (tweet) (report)
Ukraine winning territory back from Russia such as Kupyansk, Kharkiv. Russia on the defensive and withdrawing in some areas. (tweet) (tweet) (tweet)
Science
New WHO global guidance on biosecurity, encouraging member states to implement policies to reduce risk of accidental release, intentional misuse, or risks from dual use research like gain of function. (tweet) (report)
Thanks!
Note that I put up a version of this summary, read and synthesized by Coleman Snell, on the EA Forum podcast (Anchor link: here Spotify link here ... available on all podcast platforms).
He/we will be doing this for future weeks, and creating further content. Other 'live reading and discussion' of EA Forum content is in the works, stay tuned.