Supported by Rethink Priorities

This is part of a weekly series - you can see the full collection here. The first post includes some details on purpose and methodology.

If you'd like to receive these summaries via email, you can subscribe here.

 

Top Readings / Curated

A new section, designed for those without the time to read all the summaries. Everything here is also within the relevant sections later on so feel free to skip if you’re planning to read it all.

Methodology - I’ve pulled out a few posts I think offer some particularly new and interesting ideas, or would be useful to have more people giving feedback on. This is purely based on my personal opinion. Keen on any thoughts in the comments if this section is useful or not!

 

A California Effect for Artificial Intelligence

By henryj

The California Effect occurs when companies adhere to California regulations even outside California’s borders. The author evaluates three types of AI regulation California could adopt, and finds two 80% likely to produce a california effect - regulating training data through data privacy, and risk-based regulation like that proposed by the EU. A full research paper is linked, the results of a summer research fellowship.

 

Developing Next Gen PPE to Reduce Biorisks

by AndyGraham, tsmilton

The authors are engineers planning to apply to LTTF for funding to run a feasibility study on improved PPE, particularly high protection versions (ie. suits). The current versions are bulky, expensive, hard to use, and have changed little since 1979. Feasibility will be primarily research to validate the need and potential impact on GCBRs. If that resolves favorably, they hope to finalize a product within 2-5 years.

They’re looking for advice and thoughts - particularly from those with bioexpertise or PPE expertise, or who can help with the grant application.

 

Cause Exploration Prizes: Announcing our prizes

Over 150 submissions. Top prize to Organophosphate pesticides and other neurotoxicants by Ben Stewart. Second prize to each of Violence against women and girls by Akhil, Sickle Cell Disease (anonymous) and Shareholder activism by sbehmer. 20 honourable mentions. Post on learnings may come in the future.

See section ‘Bonus: winners of Open Philanthropy’s cause exploration prizes’ section below for more detailed summaries.

 

Announcing the Change Our Mind Contest for critiques of our cost-effectiveness analyses

By Givewell

Givewell is looking for critiques on their cost effectiveness analyses. Entries due 31st October, prizes up to 20K + a chance to improve allocation of millions of dollars.

 

Monitoring for deceptive alignment

by evhub

Requests DeepMind, OpenAI, and Anthropic to actively monitor and run experiments on narrow deceptive alignment (ie. where a model looks aligned only because it’s trying to, for some ulterior motive). Early examples may be relatively easy to detect (eg. because early AIs are bad at it, or defect quickly) and therefore study. This could include monitoring for pre-cursors like when an AI first develops an instrumental goal not to have itself shut down.

Concrete things to test models on include if behavior changes with / without oversight, or catching deception at source via interpretability / transparency tools.

 

From twitter: Trials show malaria vaccine gives 80% protection (3 doses + yearly booster). It’s cheap, they have a deal to manufacture 100 million doses a year, and hope to roll it out next year. (link)


EA Forum

Philosophy and Methodologies

An entire category of risks is undervalued by EA [Summary of previous forum post]

By Richard Ren

The cascading sociopolitical & economic effects of climate change, pandemics, and conflicts are undervalued in the longtermist community. They aren’t likely to be direct existential risks, but the instability they cause greatly affects a) the environment that AGI develops it’s values in (instability often leads to authoritarian, violent social values) and b) what types and speed of technologies are developed (eg. more military AI focus due to conflict).

Improving institutional resilience (food, water, energy, infrastructure) is one way to reduce these risks. Climate adaptation and drought monitoring interventions are particularly neglected in this space. This forms the link between global health and poverty efforts and longtermism, which has been missing due to a lack of systems thinking.

 

The Base Rate of Longtermism Is Bad

By ColdButtonIssues

Longtermism isn’t unique to EA, and historic examples of people trying to improve the long term future haven’t been effective. Eg. Where funds are set aside, morals and ownership can change so they’re not effectively distributed. The communist revolution failed and incurred a high cost. Religion was most successful at affecting the future, but not comparable if we don’t want EA to be a religion.

Note specific cause areas often classified as longtermist like biorisk, AI or nuclear risk are still valid, as they are important even just looking at the next few generations.
 

The discount rate is not zero

By Thomaaas

We should discount future lives not because they are worth less, but because there’s a chance they won’t exist. The discount rate equals the catastrophe rate.

A top comment by Carl Shuman argues that the catastrophe rate (and therefore discount rate) approaches zero if humanity survives the next few centuries, because of tech including aligned AI and space travel taking us to a robust safe state.

 

Object Level Interventions / Reviews

A California Effect for Artificial Intelligence

By henryj

The California Effect occurs when companies adhere to California regulations even outside California’s borders. The author evaluates three types of AI regulation California could adopt, and finds two 80% likely to produce a california effect - regulating training data through data privacy, and risk-based regulation like that proposed by the EU. A full research paper is linked, the results of a summer research fellowship.


Samotsvety's AI risk forecasts

Samotsvety is a group of forecasters with strong track records, including several EAs. Their aggregate forecasts are below, in form [average (avg. excl those who were EAs before joining the group) (range lowest to highest)]:

Misaligned AI takeover by 2100, barring pre-APS-AI catastrophe? 25% (14%) (3-91.5%)

Transformative AI by 2100, barring pre-TAI catastrophe? 81% (86%) (45-99.5%)

Probability of existential catastrophe from AI, if AGI developed by 2070? 38% (23%) (4-98%)

AGI developed in the next 20 years? 32% (26%) (10-70%)

AGI developed by 2100? 73% (77%) (45-80%)

 

AI Governance Needs Technical Work

By Mauricio

Technical work within AI governance boosts our ability to implement governance interventions. Examples include engineering technical levers to make AI regulations enforceable, information security, forecasting, developing technical standards, or grantmaking / advising / managing any of the above categories. There aren’t streamlined career pipelines for these yet, so we need people to learn more, build the expertise, and trailblaze them.
 

Developing Next Gen PPE to Reduce Biorisks

by AndyGraham, tsmilton

The authors are engineers planning to apply to LTTF for funding to run a feasibility study on improved PPE, particularly high protection versions (ie. suits). The current versions are bulky, expensive, hard to use, and have changed little since 1979. Feasibility will be primarily research to validate the need and potential impact on GCBRs. If that resolves favorably, they hope to finalize a product within 2-5 years.

They’re looking for advice and thoughts - particularly from those with bioexpertise or PPE expertise, or who can help with the grant application.
 

Zzapp Malaria: More effective than bed nets? (Wanted: CTO, COO & Funding)

by Yonatan Cale

Zzapp sprays standing water with larvicides to prevent mosquitoes breeding. Uses satellite imaging and an app to target hotspots. Estimates itself as 2x more effective than bednets in urban / semi-urban areas, based on an initial experiment. A CTO and COO could help improve cost efficiencies substantially, and they’re also looking for funding for another RCT.

 

Opportunities

New section. Competitions, jobs, scholarships, volunteer opportunities etc.

Cause Exploration Prizes: Announcing our prizes

Over 150 submissions. Top prize to Organophosphate pesticides and other neurotoxicants by Ben Stewart. Second prize to each of Violence against women and girls by Akhil, Sickle Cell Disease (anonymous) and Shareholder activism by sbehmer. 20 honourable mentions. Post on learnings may come in the future.

See section ‘Bonus: winners of Open Philantrophy’s cause exploration prizes’ section below for more detailed summaries.

 

Announcing the Change Our Mind Contest for critiques of our cost-effectiveness analyses

By Givewell

Givewell is looking for critiques on their cost effectiveness analyses. Entries due 31st October, prizes up to 20K + a chance to improve allocation of millions of dollars.
 

Announcing a Philosophy Fellowship for AI Safety

By Anders_E, Oliver Zhang, Dan Hendrycks

For philosophy PhD students and postdoctorates to work on conceptual problems in AI safety. Paid (60K, student fees, housing stipend, relocation costs), San Francisco based opportunity running from January to August 2023. Applications close September 30th.

 

Fundraising Campaigns at Your Organization: A Reliable Path to Counterfactual Impact

By High Impact Professionals

HIP supported EAs to run 8 charity fundraising campaigns at their companies in 2021, with a median and mean result of 3.9K and 30K USD raised respectively. Campaigns take ~25 hours per run, for a mean hourly return of $786 USD. There are also benefits in introducing EA ideas. If you’d like to run a campaign at your company, HIP has a step-by-step guide and offers 1-1 support.

 

Bonus: Winner’s of Open Philanthropy's Cause Exploration Prizes

(These weren’t posted this week, but are summarized here for easy reference since the winners were announced this week. This covers first and joint second place prizes.)
 

Cause exploration prize: organophosphate pesticides and other neurotoxicants

by Ben Stewart

Developmental neurotoxicants (DNTs) are chemicals like lead that adversely affect human development. Identifying and banning or decreasing exposure to these can increase IQ and therefore income to an estimated value of trillions globally.

Organophosphate pesticides are an example which has been banned in some countries since 2001, but is still prevalent in others. Meta-analyses suggest exposure is likely causing IQ deficits of up to a few percent in children, which the author estimates makes $450K funding here up to 90x as effective as GiveDirectly in some countries.

They also advocate for identifying new DNTs. It took decades to recognise and restrict currently known synthetic DNTs, synthesis of new chemicals is increasing year on year, and the only comprehensive testing currently covers just 20% of chemicals with production >1 ton / year in Europe. The author estimates identification efforts as 40x GiveDirectly effectiveness.

 

New cause area: Violence against women and girls

by Akhil

~1/3rd of women have experienced either sexual or intimate partner violence, and rates are slowly increasing. In addition to social harms, the UN approximates costs as 1.5 trillion per year due to lost economic productivity and increased utilization of public services (eg. health, criminal).

In terms of interventions, preventative measures targeting gender norms or relationships seem most effective, with some RCTs reporting $52-184 USD per DALY averted. However a majority of funding currently goes to services to help after violence has occured. There is room for more funding in scaling up effective prevention programs, running more RCTs, and policy advocacy.


 

[Cause Exploration Prizes] Sickle Cell Disease

by Open Philanthropy (ie. anonymous entry)

Sickle cell is a genetic disease that kills 100 - 200K infants per year in sub-Saharan Africa, with mortality of 50-90%. While there is no cure, with proper identification treatment can significantly reduce mortality. The area is under-funded with ~$20M committed annually.

A study indicated that the average cost per DALY averted by infant screening and treatment in Sub-Saharan Africa was $184 USD. However, since then testing costs have dropped from $9.90 to $2 per test (and further discounts may be available at scale). Note this may not account for increased costs on the healthcare system from adults with sickle cell disease.

The author suggests launching screening and treatment programs in countries with the highest incidence.


 

Shareholder activism

by sbehmer

EA funders may invest in stocks while waiting to disperse funds, which provides an opportunity for shareholder activism. We could also create new funds specifically for this - investing in smaller but influential companies for the greatest influence.

Shareholders can make requests of companies, which if refused go to a costly (~$1-4M on both sides) ballot of all shareholders, and can threaten board members with replacement. Many boards acquiesce to avoid the ballot process and risk to their jobs.

A case study of a campaign using this methodology in climate change found a cost of $0.2-$0.6/ton CO2, which is competitive with Founders Pledge top charities. Other cause areas could also be targeted, and there’s no minimum percent of shares to make a request.


 

Community & Media

Marketing Messages Trial for GWWC Giving Guide Campaign

by Erin Morrissey, david_reinstein, GraceAdams, Luke Freeman, Giving What We Can

GWWC & EAMT (EA Market Testing Team) ran a Facebook campaign in Nov - Jan to encourage people to read GWWC’s effective giving guide. They tested 7 messages, 6 videos, and different audience segments. Headline results include:

  • “Only 3% of donors give based on charity effectiveness, yet the best charities can be 100x more impactful” was the most effective message tested overall.
  • Short videos did better, as did animal videos targeted at animal-interested audiences. But climate and global poverty audiences performed worse than a general ‘philanthropy’ audience when targeted with videos on their cause areas.
  • Overall, the ‘lookalike’ audience (made to resemble people who had already interacted with GWWC) performed the best.
  • Effectiveness of videos and messages varied quite a bit by audience segment.

More trials are upcoming, and the authors are looking for both feedback and collaborators ahead of this.

 

[Link post] Optimistic “Longtermism” Is Terrible For Animals

by BrianK

Links article by Forbes, which argues that longtermism is bad for animals because if we keep growing, so might factory farming and / or wild animal suffering (eg. if we spread Earth’s animals across the universe). “If the human race creates more suffering than it alleviates, it would be a mistake to let it grow infinitely.”

 

Agree/disagree voting (& other new features September 2022)

By Lizka

Heaps of new features. Agree/disagree voting, curated posts starred on frontpage, copy-paste from google docs with footnotes, 1-1 service to connect people interested in working in a field with experts (starting with biosec) is live, cross-posting from LW is easier, and you can add topics to your profile to subscribe + share your interests. 

 

Selfish Reasons to Move to DC

By Anonymous_EA

Mainly, the EA community there rocks - warm, welcoming, easy to network. Non-EAs also tend to be impact-driven and ambitious, and there’s a good dating market. The city is nice (beautiful, museums, medium size, good veg*n food). Though housing is expensive and summer is humid.

 

Say “nay!” to the Bay (as the default)!

By Kaleem

Author’s tl;dr (slightly edited): The Bay Area isn’t a great place to center the EA community in the US. The East coast is better because of the number of top universities, its proximity and accessibility to other EA-dense spots, its importance with respect to biosecurity and US policy, and cheaper flights and cost of living.

 

13 background claims about EA

By Akash

Summarized list of impressions / background info about EA you only get living in the Berkeley AI Safety hub. Themes include that AI Safety is a primary concern and some influential EA leaders have short timelines and >10% extinction risk in the century. There’s widespread disagreement on how to tackle it and a lack of seniors / mentors to help, but plenty of programs and grants you can apply to to get started. We also lack people working on it, so please apply for programs, jobs, funding, or start your own project.

 

Who are some less-known people like Petrov?

By Lizka

Petrov was under pressure to make a decision with hugely negative outcomes for the world, and didn’t. Examples of similar from the comments include:

  • Military examples eg. Mike Jackson refused to capture a Russian-held airport in the 1990s, which could have sparked NATO <-> Russian conflict.
  • Political leader examples eg. King of Spain Juan Carlos De Borbon claimed support for Spain’s autocracy in order to be named successor, flipped after that to lead them to democracy, and then abdicated the throne to end the monarchy.
  • Civil examples eg. Li Wenliang spoke out about covid early on, despite governmental pressure not to.

 

Save the Date: EAGx LatAm

by LGlez

Mexico City, 6-8th Jan, aimed at EAs of any experience in LatAm or experienced EAs from elsewhere. Most talks are in English. Applications open 30th Oct.

 

The Maximum Impact Fund is now the Top Charities Fund

by Givewell

Renamed to better distinguish between this fund (which supports their top charities, which have a requirement for high confidence in expected impact) and their All Grants Fund (which is allocated based on cost-effectiveness, including riskier grants with high expected value).

 

'Psychology of Effective Altruism' course syllabus

by Geoffrey Miller

Syllabus from a course the author has taught 3x at University of New Mexico, advanced undergrad level. Currently updating it and welcomes suggestions.

Pablo also comments with an aggregated and regularly updated list of EA syllabi.

 

Much EA value comes from being a Schelling point

by LRudL

A critical part of EA is being a place for talented, ambitious, altruistic people to meet each other. Making this part more effective involves:

  1. Get more people: eg. increase reputation via obviously impressive projects, have more widely famous EA-linked organizations, paths to EA from adjacent areas, and reduce barriers to entry such as odd group norms or needing technical or philosophical background (while still keeping other important requirements like altruism).
  2. Help them connect: scale up matchmaking (particularly for entrepreneurs), and make use of physical hubs.

 

Didn’t Summarize

Igor Kiriluk (1974-2022) By turchin

Do AI companies make their safety researchers sign a non-disparagement clause? By ofer (no answer yet in comments)


 

LW Forum

AI Impacts / New Capabilities

Linkpost: Github Copilot productivity experiment

by Daniel Kokotajlo

An experiment found developers completed the task of writing a HTTP server in Javascript ~55% faster with Github Copilot than without (saving ~1.5 hours). However the author notes it is a simple well-known task, given that and publication bias we shouldn’t weigh this too strongly.

 

AI Meta & Methodologies

Most People Start With The Same Few Bad Ideas

By johnswentworth

Most newcomers to alignment start with the same ideas, and take ~5 years to start plausibly useful research. The most common ideas (~75%) are variants of ‘train an AI to help with training AI’. Looking for problems with your plan, having a peer group to poke holes in each others’ plans, and exposure to a variety of alignment models are helpful for speeding that up. The author estimates the MATS summer program helped attendees skip forward a median 3 years.

Top comments question if this might cause ‘following the herd’ and make newcomers less likely to contribute original work or question field assumptions. The author responds that the focus should be on peer-to-peer critique and skills for analyzing flaws in ideas vs. specific critiques of newcomers’ ideas by experts, to avoid this risk.
 

Monitoring for deceptive alignment

by evhub

Requests DeepMind, OpenAI, and Anthropic to actively monitor and run experiments on narrow deceptive alignment (ie. where a model looks aligned only because it’s trying to, for some ulterior motive). Early examples may be relatively easy to detect (eg. because early AIs are bad at it, or defect quickly) and therefore study. This could include monitoring for pre-cursors like when an AI first develops an instrumental goal not to have itself shut down.

Concrete things to test models on include if behavior changes with / without oversight, or catching deception at source via interpretability / transparency tools.

 

Alignment papers roundup - week 1

by Quintin Pope

New weekly series papers that seem relevant to alignment, focusing on papers or approaches that might be new to safety researchers. Links, abstracts, and the author’s opinions are shared.

 

The shard theory of human values

By Quintin Pope, TurnTrout

Author’s tl;dr: “We propose a theory of human value formation. According to this theory, the reward system shapes human values in a relatively straightforward manner. Human values are not e.g. an incredibly complicated, genetically hard-coded set of drives, but rather sets of contextually activated heuristics which were shaped by and bootstrapped from crude, genetically hard-coded reward circuitry.”

They test this theory against cognitive biases like scope insensitivity and sunk cost fallacy, and find it explains these more simply than most existing explanations.

 

Private alignment research sharing and coordination

By porby

Coordination on AI alignment research is hard. A lot is shared informally, but some researchers don’t have that network, and don’t want to share publicly because of info hazards or because it includes confidential information to their organization.

The post suggests a solution could be a database and forum anyone can submit to, but with highly restricted read permissions. Those with the highest access would need to be responsible to monitor and guide the use of the most dangerous or confidential information.

 

An Update on Academia vs. Industry (one year into my faculty job)

By David Scott Krueger (formerly: capybaralet)

Notes on conversations of industry vs. academic for AI safety, and experiences from ~1yr as an assistant professor. Themes include that academica is often dismissed as a pathway but has benefits (rapid training & credentialing, it’s becoming easier to work on safety and access foundations models, and is enjoyable). Has also been approached by senior people in ML concerned about AI safety, and is looking for a go-to response and resources for them.
 

[An email with a bunch of links I sent an experienced ML researcher interested in learning about Alignment / x-safety.]

by David Scott Krueger (formerly: capybaralet)

Copy-pasted email made for a particular person after several discussions - not a ready-to-go template to send all ML researchers interested in Alignment. That said, a good starting point that aims to give a diverse and representative sampling of AI safety stuff.


 

Solar Blackout Resistance

by jefftk

Residential solar panels shut down in a power outage, so they don’t shock utility workers fixing things. If they instead disconnected from the grid but continued powering the house, this provides widespread distributed power in a catastrophe. The electronics to support this could be cheap, if there was high demand. Paths to encourage this include the government requiring this resilience, or only subsidizing solar panels which included these resiliency adaptations.

 

Overton Gymnastics: An Exercise in Discomfort

By Shos Tekofsky, omark

Idea for a new group rationality exercise, with instructions for three variants. All involve every participant sharing their most controversial opinions, and some involve questions from the group until they can pass an ideological turing test on these (convincingly argue for them).

 

Didn’t Summarize

Rejected Early Drafts of Newcomb's Problem By zahmahkibo (meme versions)

The ethics of reclining airplane seats By braces (low-stakes example of ethical debate on twitter)

Let's Terraform West Texas By blackstampede (half-serious proposal, including cost estimates)

Searching for Modularity in Large Language Models by NickyP, Stephen Fowler


This Week on Twitter

AI

Riley Goodside continues his experiments on GPT-3, showing it can be prompted to use tools like Python to fill gaps in its skills / knowledge. (link)  Also shows it remembers the context it generated in previous answers, to answer new ones. (link)

Open source community figured out how to save ~¼ of the necessary VRAM for stable diffusion, just a couple weeks after release. (link) (Note last week they also substantially increased speed - so lots of performance improvements in a short timeframe)

Stephen Casper and coauthors publish a paper showing adversarial training is more effective if the models can see each other’s internal state. (link)

New article on the limits of language models. (link)

New research paper on how to align conversational agents with human values. (link)

 

Forecasting

New ‘pastcasting’ app gives users the ability to forecast on past questions whose resolution they don’t know - increasing feedback loops / learning. (link) (link)

 

National Security

CSET published research in June about the Chinese People’s Liberation Army dependence on chips from American companies for military progress in AI. Implication that it may have affected the US government’s decision to ban China from buying these chips from US companies. (link)  They also share a report from July where they’re tracking what Chinese companies are doing in the general AI space. (link)

Ukraine has launched a counter-offensive against Russia, details still coming out but significant land taken back, Russia retreating some areas. Russia responded by attacking Ukrainian power plants, trying to cut electricity. (link)

 

Science

Trials show malaria vaccine gives 80% protection (3 doses + yearly booster). It’s cheap, they have a deal to manufacture 100 million doses a year, and hope to roll it out next year. (link)

Monkeypox cases are falling, Covid BA5 wave is receding, and an unknown pneumonia in Argentina was identified as legionella. A good week for public health! (link)

36

New Comment