Supported by Rethink Priorities
This is part of a weekly series - you can see the previous week's summaries here, which also includes some notes on purpose and methodology.
If you'd like to receive these summaries via email, you can subscribe here.
Philosophy and Methodologies
Author links two previous posts they’ve written questioning the morality of caring:
1) Shut up and Divide - if you take your emotional caring for a population, and divide it by number of people, you get that you shouldn’t care almost at all about a given individual. If this feels wrong, is the ‘shut up and multiply’ argument also wrong?
2) Small astronomical waste - as a perfect utilitarian, we would trade away everything in a finite universe for more resources in an unlimited universe (more possible utility). Our best guess is our universe is finite, so we should act like we traded them away and stop caring. If this feels wrong, is the parliamentary model for moral uncertainty wrong more generally?
Uncertainty analysis aims to show how outputs (eg. recommendations) change based on inputs (eg. moral weights, experimental results). For example, it allows statements like ‘malaria charities could be the most cost-effective recommendations if life-years were worth >2.5x consumption doublings’, or ‘replicating this trial is worth $500K in reduced uncertainty’.
Nine uncertainty analysis techniques / sub-techniques from health economics are explained:
- Scenario analysis - Model the case. Vary some inputs (eg. lives saved per net) to make different scenarios. What happens to the outcome? All other techniques build on this.
- Threshold analysis - what would an input need to be in order to change the recommendation? Do we have confidence on which side it falls?
- Multi-way scenario analysis - vary multiple inputs at once.
- Deterministic Sensitivity Analysis (DSA) - vary each input from reasonable max to min for each charity. Plot recommendations in each case. Are they reasonably consistent? Which inputs drive changes?
- Value of information analysis - if we knew an input for sure, how many cases of different recommendations might that solve / how much is that worth in $s?
- Probabilistic Sensitivity Analysis (PSA) - like DSAs, but instead of just max and min, model the probability distribution of each input and their covariance.
- Probability most cost effective: use the PSA to calculate a probability that a given charity is the most cost-effective.
- Altruistic Parameter Acceptability Curve: the above, weighted for one parameter of particular interest / that we have a leaning for, via running a DSA on that one.
- Risk adjustment: the above, where that parameter is risk, and we want to see how recommendations change with risk aversion.
By Richard Y Chappell
Devil’s advocate argument that EA philosophy is too happiness / suffering focused, and true meaning comes from contribution to cultural excellence, so we should help people do that. Solves for the Repugnant Conclusion, because it values excellence over having max (people x happiness).
This value system implies EA should focus more on talent scouting, nurturing potential, arts / culture, and protecting the future from hedonism. Less on global health and animal welfare. Both value systems care about protecting humanity and our long-term potential.
Note the author does not believe this view (labeled ‘perfectionism’) but thinks it has useful insights, including to value excellence as well as welfare.
The Scourge explores the claim embryos have equal moral status to adult humans. This would imply natural loss of ~½ of all embryos should be one of the greatest problems of our time. Nearly all reject this conclusion, which implies the claim is not truly believed.
This post extends that argument to longtermism and animal welfare: If we claim future humans and animals have equal moral value to current humans, this implies existential risk and animal suffering are the greatest problems of our time. The world doesn’t act like this, so the claim is not truly believed.
Though several commenters argue that the true test is whether the conclusion seems absurd or hard to swallow (particularly to EAs), not whether it is being acted on.
By Holden Karnofsky
EA is about maximizing good, but we disagree on what ‘good’ is, and a whole hearted maximation could put aside other important things in pursuit of it. Solution: practice moderation. On the whole we’re doing well on this, just be aware of the risk.
In EA, it can feel like everything is towards the one end of helping the world. This leads to burnout. But it’s okay to have multiple goals. We could create a community that helps each other with these personal goals too - “taking the good life at the scale of its individuals out of competition with impact at the scale of the world”.
Squiggle is a new estimation language. This post describes how to use it for building simple models, via the website + google sheets.
Object Level Interventions / Reviews
Some key EA research papers are written by those without prior experience in the relevant field. The author works for a major defense contractor, and analyzes the series ‘Risks from Nuclear Weapons’ by RP from this lens.
The main critique is on the claim that 990 - 1500 US nuclear warheads would survive a first strike. The author argues that fewer than predicted US submarine warheads may survive as ~1/3rd of the submarines are at port at a given time, and that aircraft based warheads would be unlikely to survive if not given long warning periods (>1hr) due to time to load them onto the craft.
By Shakeel Hashim
Givewell recently took GiveDirectly off its top charities list because funding gaps at top charities are 10x as effective. This post argues it should still be recommended somewhere, as it’s the most scalable option if Givewell were to attract significantly more funding (in the large billions).
By Michaël Trazzi
Highlights from an interview with a FHI research fellow working on philosophy of AI safety and AI consciousness. AI Sentience is important to discuss, because people (eg. Blake Lemoine) have already started thinking AI is conscious, and that will only increase as it gets better.
The researcher notes pain / pleasure / experience matter, regardless if we label them ‘consciousness’. An AI without all human intelligences might still feel these, or might have experiences incomprehensible to us (different senses and world interactions), so we should watch for evidence like them wanting to talk about it. Memory might also be an important building block, as it allows for an enduring sense of self.
BOTEC based on survey results calculates US crime costs 1.6-2.2 trillion (~9% GDP). Other estimation methods are similar scale, lower end are 2-6% of US GDP. Considering broader effects may push this higher. Many criminal justice reform organizations are not working on reducing crime specifically.
Crime seems to be mainly repeat offenders. Approaches like replacing short sentences with community service, increasing sentences for repeat offenders, and humane treatment in prisons have some backing from international examples. More research here could be impactful.
By Gabriel Mukobi
Links resources to become capable of original AI safety research, split into 7 levels of ~100-200 hours each. Also gives goals for ‘passing’ each level, and why they are useful. Levels are: AI Safety Fundamentals, Software Engineering, Machine Learning, Deep Learning, Understanding Transformers, Reimplementing Papers, Original Experiments.
Link to a new page summarizing the impact of 5 different programs by AAC. 77+ people involved in programs have gone on to gain relevant positions in animal advocacy, and an additional 13 have been directly placed via the fundraising work placement or recruitment services.
Community & Media
By Jack R
Top comments ask for: Industrial Organizational Psychology, TikTok, Computational Linguistics, ‘Dark Triad’ Psychology, Comms Specialists, knowledge of prominent ideologies worldwide, hiring, and risk assessment related to nuclear weapon hacking.
By Yonatan Cale
If you aren’t sure which is better, ask the org you’re considering working for. Some specific org’s answers in the post and comments.
Independent EA projects or community building can have good short-term impact, but often lack mentorship. Skill building outside EA is sometimes better - remember to evaluate your options.
By Sandra Malagon
Stay in Mexico City 1 Nov to 30 Jan, accommodation, coworking space and some meals provided. Applications for full period or any 2-4 weeks are open now.
By Peter Wildeford
Prizes are low risk for organizations and can help with identifying talent, but aren’t always a good deal for participants - EV can be low and risk high. Set aside a big enough prize pool (including for honorable mentions) and use clear instructions & judging criteria to get around this issue.
By TheOtherHannah, TJPHutton, S.E. Montgomery, Kyle Webster
Using global health examples, argues that EA’s narrow demographics (>70% each of white, male, young) and lack of representatives from beneficiary communities are limiting impact and affecting EAs reputation. Insider perspectives are particularly useful when an intervention has low adoption rates, and can be achieved via partnering or increasing diversity of the EA community to begin with.
An example is when Givewell and IDinsight ran a study on beneficiary community moral weights, the value ratio on saving a life of someone under age 5 : over age 5 was 4.9x off that of Givewell’s staff’s median weights. This didn’t change recommended charities in that case, but that scale of difference easily could.
The authors recommend we increase discussion of this topic within EA, consider power-sharing structures, and share EA tools more broadly (geographically). They also have specific recommendations for each of Givewell, grant-makers, and EA group leaders.
A big part of today’s EA is existential risk reduction. The name Effective Altruism doesn’t capture this for:
a) People who care about helping others now, but don’t care about existential risks.
b) People who care about existential risks, but for non-altruistic reasons (‘I don’t want to die’).
It can also lead to us introducing existential risk ideas via talking about charity evaluation first, and to people feeling confused or duped if they were recruited that way. Potential solutions are renaming, and/or splitting off existential risk into its own community (like the Rationality community).
By lukasj10, Isaac_Esparza
Healthier Hens reports on hiring a country manager for Kenya with no previous EA link (but EA interest and good moral judgment). They found it paid off, with his experience and connections opening doors. His experience was also positive - he mentions being inspired by the long-term focus, ITN framework, and novel interventions.
They suggest having EA concepts a big part of onboarding, and that having people there to ask questions of is particularly important (vs. just using online resources). Learning to communicate EA ideas to stakeholders is also key.
By James Aitchison
16 podcast appearances, 18 articles, 3 animations and 3 misc coverage. Includes Time, Vox, NYT, The Guardian, Freakanomics and many others. List will continue to be updated on the author’s blog.
By Scott Alexander
Many conferences are 10x the size of EAG - keeping it exclusive isn’t due to logistics. Rejecting applicants can make them sad. Others may not apply to avoid taking up a spot. It’s also less welcoming to newcomers or those with new ideas and non-typical EA backgrounds. We could make EAG more welcoming, while retaining networking quality via eg. cause-specific conferences or needing to apply to use the networking app.
In the comments, Eli Nathan (CEA’s lead on EAG) notes most advisors and attendees favor the current setup, that in recent years there has been a bar for applicants to meet vs. an attendee limit, and that they will update materials to make the event more clearly networking / impact based vs. meet people and learn more about EA.
Evaluates every CEA community building project, with a focus on problems / mistakes. Key themes included CEA lacking staffing levels to execute to commitments, rarely publishing project evaluations, and understating mistakes in public forums. Current management (since ~2019) has made progress on these factors, but public program evaluations are still lacking.
Argues that WWOTF:
- Underestimates misaligned AI risk - WWOTF estimates 3% in the century, while elifland estimates ~44% partially based on community / forecaster averages.
- Overestimates stagnation risk- WWOTF estimates 35% in the century, while a group of forecasters including the author estimate ~5%.
- Is unclear on LT priorities, leaving the impression that biorisk and AI are of similar importance to climate change and maintaining technological progress.
Primarily because of a clearer ranking of LT priorities, we should consider promoting The Precipice over WWOTF for potential longtermist direct workers (such as students).
Most EA funding is from a few sources, many of which share networks and don’t take unsolicited applications. This reliance on personal connections means those that don’t ‘fit’ in the community leave, it’s harder to criticize parts of EA, we get a more ‘cult-ish’ reputation, and we lack diversity. It also doesn’t scale.
Solutions include opening grant applications, hiring without screening for EA involvement, and building connections between EA and adjacent groups like public health or AI. Individually, keep up personal and professional connections outside EA.
Author’s tl;dr: EA lacks the protective norms nearly universal in mature institutions, leaving it vulnerable to the two leading causes of organizational sudden death: corruption and sex scandals.
Epistemic status can help readers understand how seriously to take a post, encourage collaboration, and avoid info cascades. If you’re using one, link it to this post so people who don’t know what it means can read up on it.
Upcoming 75m talk on the EA funding landscape, how to get funded, and when and why to turn it down. 6pm 13th Sept UK time, limited spaces - sign up if interested.
New 80,000 Hours problem profile - Artificial intelligence By Benjamin Hilton, 80000_Hours
An Audio Introduction to Nick Bostrom By peterhartree (new podcast on Nick Bostrom’s ideas)
How might we align transformative AI if it’s developed very soon? By Holden Karnofsky (in LW summary section)
Founding the Against Malaria Foundation: Rob Mather's story By Giving What We Can (12m video, truncated interview)
Peter Eckersley (1979-2022) By Gavin
Celebrations and Gratitude Thread By Lizka, MaxDalton
Applications for the 2023 Tarbell Fellowship now open (one-year journalism programme) By Cillian Crosson, Training for Good
EA & LW Forums Weekly Summary (21 Aug - 27 Aug 22’) By Zoe Williams (avoiding recursion!)
Animal Zoo By bericlair (fictional story)
21 criticisms of EA I'm thinking about By Peter Wildeford (already a summarized list)
AI Meta & Methodologies
By Thomas Larsen, elifland
There’s a table a few paragraphs in that’s a great summary already. Deception (AIs deceiving humans to achieve their goals) and scalable oversight (model evaluation and feedback at scale, avoiding issues with proxy success criteria) are the two most common problems worked on - but there’s a wide range outside that.
One method is to use AIs themselves eg. deploying AIs to detect dangerous actions by other AIs or to assist in alignment research.
This requires those AIs to be safe. Value alignment, honesty, corrigibility (let itself be altered / shut down) and legibility are important attributes for this. To achieve these, we need accurate reinforcement of them in a range of scenarios, to prevent exploits (technical and human) and to constantly assess threats. Developing a variety of AIs with varying capabilities could also help create a network that checks and balances each other.
Most AI X-risk is in worlds where iterative design fails - focus on solving for these. Possible ways of failure include:
- Hiding problems: The AI learns to make things look right, not actually be right, so it stops really improving. Particularly a problem for RLHF methods.
- Not knowing what to look for: We ask for clean energy designs, and get them. But we don’t ask if they could be used as a bomb, and they can. And if we ask our model to point out issues proactively, how do we evaluate that, without being good at it ourselves?
- Getting what we measure: because we define what we want imperfectly, we get bad versions of it eg. a false sense of security rather than actual security or profit rather than value.
EA has invested lots of resources into coordination between big AI labs, but haven’t gotten clear wins from this yet. Wins build influence, so we should aim to get some small ones on the board.
In the comments, Kaj_Sotala mentions a 2015 open letter on research priorities for beneficial AI, signed by >8K people incl. prominent AI/ML researchers. But no similar scale wins since.
By Sam Bowman
Post is pretty summarized already, but a few headline results:
- 57% believe recent developments in ML modeling are significant steps toward AGI.
- 36% agreed it plausible for decisions by AI to cause catastrophe of scale nuclear war or higher.
- 87% think NLP research will be net positive. (32% of these also agreed with point above)
- 67% agreed a majority of NLP research is of dubious scientific value.
Some things we treat as bugs in human reasoning might actually be good AI safety features. For instance, children following adult advice even if it goes against their own reasoning, decision paralysis while waiting on more info, or refusing to mix ‘sacred’ (eg. lives) and ‘non-sacred’ (eg. money) values in calculations. More examples are given in the post.
NB: bit technical, unsure if I summarized correctly
Deceptive alignment is a model that specifically tries to look aligned in training, for instrumental reasons. SGD (stochastic gradient descent) models seem likely to produce this, because deception (or getting the right answers for the wrong reasons) is simpler than truly understanding the motive behind the training data.
By Sam Bowman
Maps out different communities related to AI Safety - including what they are, the link to AI Safety, and key people or intros. The communities are AI Safety, EA, Longtermism, Rationalists, AGI Optimism, AI Ethics, and LT AI governance.
Not AI Related
Three examples of cinema tackling characters realistically expanding the moral boundaries of their worlds. These examples caused the author to identify two important aspects of moral progress to them - coordination morality (so we’re more likely to get to ‘highest utility overall’ solutions) and ‘I dunno, stuff I just care about’ morality (eg. friendship, love, art, helping others).
Summarizes 8 different theories of consciousness. Very short versions below:
Mysterianism - it’ll always seem magical, physical explanations will never be enough.
Cartesian Dualism - ‘I think, therefore I am’ - so thinking stuff must exist. Physical stuff might not. So thinking stuff must not be the same realm as physical stuff.
Global Workspace Theory - Consciousness is a spotlight that highlights important thoughts and makes them available to other areas of the brain (the global workspace).
Predictive Processing - Consciousness is what our brain predicts, which actions then fulfill. If we’re wrong, we need a new mental model, which is felt as emotions.
Integrated Information Theory - Consciousness is the number of possible states of information that are fundamentally integrated (can’t be separated), and this can be quantified.
Orchestrated Objective Reduction - qualia (smallest consciousness units) are quantum computations. The brain entangles these to get bigger consciousness.
Strange Loops and Tangled Hierarchies - the brain is a symbol manipulator. Given enough complexity, it’ll create an ‘I’ symbol, and having a complex ‘I’ symbol is consciousness.
Attention Schema Theory - there’s so much info, we need to up and down regulate it. We do this using attention, our ‘global workspace’, which is the contents of consciousness too.
The author also gives views on each and concludes that illusionism (consciousness would come about every time with the same algorithm as we run, it’s a byproduct) is likely. They also note memory seems important and often overlooked in the above theories.
Asks for thoughts on how to prepare for a future that might be radically different in 15 years.
Some top comments suggest investing in ‘eat the seedcorn’ stocks like cigarettes with bad long-term prospects (and not in 401Ks), donating now instead of later, and working in important jobs now.
Over thousands of years, progress has been more than exponential. Argues this is because factors like human capital, manufacturing, and infrastructure moving forward make new problems vastly easier to solve - leading to super-exponential progress over the very long term.
Simulators By janus
Author’s tl;dr: Self-supervised learning may create AGI or its foundation. What would that look like?
Infra-Exercises, Part 1 By Diffractor, Jack Parker, Connall Garrod
This Week on Twitter
New Models or Model Capacities
Riley Goodside did a variety of experiments, and found that GPT-3 can ‘learn’ from intermediate tasks Eg. it can’t identify rhyming words, unless asked first to list their pronunciations in IPA - then it can. (link) The same strategy works for letter counting (link). It can also comment code (link) and translate algorithms or regular expressions into kind-of-poems (link).
Open source programmers forked StableDiffusion and substantially increased speed, to ~15s per image on Apple’s M1 chips. (link)
New SOTA game-playing for methods without lookahead search, using transformers. Reduces sample inefficiency in world models. (link)
DeepMind developed a language-model based question-answering system that is faithful to the laws of logic, meaning that their models can explain their reasoning behind a question’s answer. (link)
SciRobotics new research on how coordination can emerge in systems. (link)
US orders Nvidia to halt sales of top AI chips to China and Russia. (link)
Last Week Tonight put out a video on AI generated images. (link)
DNDi is closer to finding a treatment for a neglected disease using AlphaFold. (link)
Jeff Sebo has a new book, The Moral Circle, due out next year. It will make the case for extending moral standing to a wide range of beings, including insects and advanced AIs. (link)
Animal welfare is in all new European 5-year agriculture plans, and France & Denmark are subsidizing plant protein development too. (link)
Good Judgement has a (possibly new?) forecasting training course. (link)
US policy on Taiwan has always been fuzzy by design. Chris Murphy says congress and some think tanks are now talking about ending strategic ambiguity by formally recognizing Taiwan and promising protection. (link)
US govt. approves 1.1B arms sales to Taiwan. (link) Liu Pengyu (spokesperson of chinese embassy in US) responds that this is interference in China’s sovereignty and severely jeopardizes China-US relations. (link)
Russia requested a Geneva meeting to air claims that Ukraine (with US support) has been working on bioweapons. Evidence this is distraction / misinformation. (link)
Moderna suing Pfizer over vaccine tech. (link)