Supported by Rethink Priorities
This is part of a weekly series summarizing the top posts on the EA and LW forums - you can see the full collection here. The first post includes some details on purpose and methodology. Feedback, thoughts, and corrections are welcomed.
If you'd like to receive these summaries via email, you can subscribe here.
Podcast version: Subscribe on your favorite podcast app by searching for 'EA Forum Podcast (Summaries)'. A big thanks to Coleman Snell for producing these!
Philosophy and Methodologies
by Rory Fenton
When your intervention is expensive relative to data collection, you can maximize statistical power for a given cost by using a larger control and smaller treatment group. The optimal ratio of treatment sample to control sample is the square root of the cost per treatment participant divided by the square root of the cost per control participant.
Object Level Interventions / Reviews
by Holden Karnofsky
Grounded suggestions (ie. more than one AI lab has made a serious effort at each suggestion) for major AI companies to help the most important century go well:
- Prioritize alignment research, strong security, and safety standards (eg. figuring out what behaviors are dangerous, and what to do if they see them).
- Avoid hype and acceleration (eg. publish fewer flashy demos or breakthrough papers).
- Prepare for difficult decisions via setting up governance, employee and investor expectations to allow for decisions that aren’t profit-maximizing.
- Balance the above with conventional / financial success (so they stay relevant).
Holden is less excited about the following interventions labs sometimes take: censorship of AI models, open-sourcing AI models, and raising awareness of AI with governments and the public.
Linkpost for this blog post from OpenAI’s CEO Sam Altman on their AGI roadmap. Key points include:
- OpenAI’s mission is to ensure artificial intelligence benefits all humanity. They operate as if risks are existential and want to successfully navigate these.
- They think short timelines and slow takeoff seem like the safest scenario, since it minimizes compute overhang and gives maximum time to empirically solve safety and adapt to AGI.
- In the short term, OpenAI intends to create and deploy successively more powerful systems in the real world to allow a tight feedback loop and society to adapt.
- As these get closer to AGI, they will become increasingly cautious. If the balance of pros and cons shifts, they will significantly change the deployment approach.
- They want to create new alignment techniques, and believe capabilities progress is necessary for this.
- They want to start a global conversation on how to govern AI systems, fairly distribute their benefits, and fairly share access.
- They have concrete measures in place to allow safety-oriented behavior eg. capping shareholder returns, governance by a non-profit that can override for-profit interests, and a clause in their Charter to not race in late-stage AGI.
Things they want to see
- Independent audits before releasing new systems (they’re releasing more info on this later in the year).
- Public standards on when an AGI effort should stop a training run, and how to decide if a model is safe to release, when to pull from production etc.
- Government having oversight on training runs above a certain scale.
- Coordination between AGI efforts to slow down at critical junctures (eg. if the world needs time to adapt).
by Andrea_Miotti, Gabriel Alfour
The authors share their view that AGI has a significant probability of happening in the next 5 years, given progress on agents, multimodal models, language tasks, and robotics in the past few years. However, we are still early on the path to safety eg. not knowing how to get language models to be truthful, not understanding their decisions, optimizers yielding unexpected results, RLHF / fine-tuning not working very well, and not knowing how to predict AI capabilities.
Various players are racing towards AGI, including AdeptAI (training a model to “use every software tool and API in the world”), DeepMind whose mission is to solve intelligence and create AGI, and OpenAI, who kickstarted further race mechanics with ChatGPT.
This all means we’re in a bad scenario, and they recommend readers ask lots of questions and reward openness. They’re also hopeful narrower sub-problems of alignment can be achieved in time eg. ensuring the boundedness of AI systems.
by Tomek Korbak, Sam Bowman, Ethan Perez
Author’s tl;dr: “In the paper, we show how to train LMs (language models) with human preferences (as in reinforcement learning with human feedback), but during LM pretraining. We find that pretraining works much better than the standard practice of only finetuning with human preferences after pretraining; our resulting LMs generate text that is more often in line with human preferences and are more robust to red teaming attacks. Our best method is conditional training, where we learn a predictive model of internet texts conditional on their human preference scores, e.g., evaluated by a predictive model of human preferences. This approach retains the advantages of learning from human preferences, while potentially mitigating risks from training agents with RL by learning a predictive model or simulator.”
Argues that statements by large language models that seem to report their internal life (eg. ‘I feel scared because I don’t know what to do’), aren’t straightforward evidence either for or against the sentience of that model. As an analogy, parrots are probably sentient and very likely feel pain. But when they say ‘I feel pain’, that doesn’t mean they are in pain.
It might be possible to train systems to more accurately report if they are sentient, via removing any other incentives for saying conscious-sounding things, and training them to report their own mental states. However, this could advance dangerous capabilities like situational awareness, and training on self-reflection might also be what ends up making a system sentient.
Long post gathering examples of Bing Chat’s behavior, general public reactions (eg. in the news), and reactions within the AI Safety community. It’s written for accessibility to those not previously familiar with LessWrong or its concepts.
In March 2016, DeepMind’s AlphaGo beat the plausibly strongest player in the world 4 to 1. Since then, this work has been extended, eg. in KataGo - now a top Go bot.
Last November Wang et al adversarially trained a bot to beat KataGo, which it does by playing moves that cause KataGo to make obvious blunders from strong positions. Human novices are able to beat this adversarial bot (so it’s not great at Go overall), yet it beats KataGo in 72% of cases and some strong human players can copy its techniques to also beat KataGo.
This suggests that despite Go having quite simple concepts (liberties, live groups, dead groups), strong Go bots have achieved performance sufficient to beat the best human players without learning them.
The author argues that new skilled visionaries in alignment research tend to push in a different directions than existing ones. They suspect this is because the level of vision required for progressing the field requires a strong intuition to follow, there’s a lot of space to find those in, and they can’t be easily redirected. This means that any single alignment path isn’t being sped up by adding more skilled talent (more than a factor of say 2x eg. from less visionary researchers and ops support).
Argues that exploits of large language models (such as getting them to explain steps to build a bomb) are examples of misuse, not misalignment. “Does not do things its creators dislike even when the user user wants it to” is too high a bar for alignment, eg. higher than we ask of kitchenware.
Linkpost for this paper, which isolates the key mechanism (retargetability) which enables the results in another paper: Optimal Policies Tend to Seek Power. The author thinks it’s a better paper for communicating concerns about power-seeking to the broader ML world.
Abstract (truncated): If capable AI agents are generally incentivized to seek power in service of the objectives we specify for them, then these systems will pose enormous risks, in addition to enormous benefits. [...] We show that a range of qualitatively dissimilar decision-making procedures incentivize agents to seek power. We demonstrate the flexibility of our results by reasoning about learned policy incentives in Montezuma's Revenge. These results suggest a safety risk: Eventually, retargetable training procedures may train real-world agents which seek power over humans.
by Kat Woods, peterbarnett
The authors interviewed ten AI safety researchers on their day-to-day experience, and favorite and least favorite parts of the job.
by remember, Andrea_Miotti
Transcript of this podcast episode. Key points discussed:
- ChatGPT: was a big leap over models pre-2018 or arguably 2020 (and just commercialization of models post that). Generality of intelligence is a range, and this is less general than a human, more than a cat, arguable either way vs. a chimpanzee. Not dangerous.
- Superintelligence: isn’t perfect, but is better than humans at everything.
- Why is ‘it kills all humans’ the default? : Our tools are too blunt to truly get our goals into the systems we create. Similar to how evolution had the goal ‘make lots of copies of your DNA’ and we do things directly counter to that sometimes. So the AI will end up with a different goal, and for most goals, we’re just atoms that could be put to better use.
- Why haven’t other species invented AI that is now messing with us? Look up grabby aliens. They’re possibly too far away.
- What next for Eliezer: currently on sabbatical. After that may help a small safety org eg. Conjecture, Anthropic or Redwood Research, or maybe continue doing public writings.
- How can funds be used? A billion+ could maybe pay AI developers to go live on an island and stop developing AI. Otherwise not sure, but thinks MIRI or Redwood Research wouldn’t spend it for the sake of it.
- Who else has good alternate views? Paul Christiano (main technical opposition), Ajeya Cotra (worked with Paul, good at explaining things), Kelsey Piper (also good at explaining things), Robin Hanson.
- What can people do about all this? Primarily don’t make things worse / speed things up. If you have an awesome idea, follow it, but don’t tell everyone it’s the solution in case it’s not.
The European Commission requested scientific opinions / recommendations on animal welfare from the European Food Safety Authority (EFSA), ahead of a legislative proposal in the second half 2023. The recommendations EFSA published include cage-free housing for birds, avoiding all mutilation and feed and water restrictions in broiler breeders, and substantially reducing stocking density. This result is partially due to efforts of EA-affiliated animal welfare organisations.
by Dustin Crummett
The Insect Institute is a fiscally-sponsored project of Rethink Priorities which focuses on the rapidly growing use of insects as food and feed. It will work with policymakers, industry, and others to address key uncertainties involving animal welfare, public health, and environmental sustainability. You can sign up for their email list via the option at the bottom of their homepage.
A recent Faunalytics report made the claim that “by some estimates, a Big Mac would cost $13 without subsidies and a pound of ground meat would cost $30”. The author found the claim implausible and thinks it possible the original claim (since repeated in many articles) originated in the 2013 book Meatonomics - which in addition to subsidies included cruelty, environmental, and health costs in a calculation of the true cost of a Big Mac. It also likely over-estimated Big Macs as a proportion of beef consumption, making the statistic unreliable.
Global Health and Development
In its cost-effectiveness estimate of StrongMinds, Happier Lives Institute (HLI) estimates that each household member (~5 in the average household) benefits from the intervention 50% as much as the person receiving therapy. This is partially based on 3 RCTS - 2 which had interventions specifically targeted to benefit household members (eg. therapy for caregivers of children with nodding syndrome, which included the addition of nodding syndrome-specific content) and where only those expected to benefit most were measured. The third was incorrectly read as showing benefits to household members, when the evidence was actually mixed depending on the measure used.
The author argues this means household benefits were significantly overestimated, and speculatively guesses them to be more in the 5 - 25% range. This would reduce the estimated cost-effectiveness of StrongMinds from 9x to 3-6x cash transfers. In the comments HLI has thanked James for this analysis and acknowledged the points as valid, noted the lack of hard evidence in the area, and shared their plans for further analysis using a recent paper from 2022.
by JoelMcGuire, Samuel Dupret
Report of a 2-week investigation on the impact of immigration reform on subjective well-being (SWB), including a literature review and BOTECs on the cost-effectiveness of interventions.
The authors find potential large impacts to SWB from immigrating to countries with higher SWB levels, but are uncertain on the effect size or how it changes over time. All estimates below are highly uncertain best guesses based on their model:
- Immigrants gain an immediate and permanent 77% of the difference in SWB between their origin and destination country upon immigrating.
- Household members left behind show a small positive spillover of +0.01 WELLBY per household member.
- When the proportion of immigrants in a community increases by 1%, there is a small and non-significant negative spillover for natives of -0.01 WELLBYs per person.
Of interventions investigated to increase immigration, the most promising was policy advocacy, estimated at 11x cost-effectiveness of Givewell cash transfers on SWB.
Rationality, Productivity & Life Advice
The author suggests one way to be okay with existential dread is to define yourself as someone who does the best they can with what they have, and treating that in and of itself as victory. This means even horrible outcomes for our species can’t quite cut to the core of who you are. They elaborate with a range of anecdotes, advice, and examples that highlight parts of how to capture that feeling.
Community & Media
by Deena Englander, JaimeRV, Markus Amalthea Magnuson, Eva Feldkamp, Mati_Roy, daniel wernstedt, Georg Wind
EASE (EA Services) is a directory of independent agencies and freelancers offering expertise to EA-aligned organisations. Vendors are screened to ensure they’re true experts in their fields, and have experience with EA. If you’d like to join the directory, you can apply for screening here. If you’d like to use the services, you can contact the agencies listed directly, or email email@example.com for suggestions for your needs and available budget.
In 2022 the EAG team ran three EAGs, with 1.3-1.5K attendees each. These events averaged a 9.02 / 10 response to a question on if participants would recommend EAGs, and caused at least 36K new connections to be made (heavily under-reported as most attendees didn’t fill in this feedback).
In 2023 they plan to reduce spending - primarily on travel grants and food - but still do three EAGs. They now have a team of ~4 FTEs, and will focus on launching applications earlier, and improving response speed, Swapcard, and communications.
The full list of confirmed and provisional EAG and EAGx events are:
EA Global: Bay Area | 24–26 February
EAGxCambridge | 17–19 March
EAGxNordics | 21–23 April
EA Global: London | 19–21 May
EAGxWarsaw | 9–11 June [provisional]
EAGxNYC | July / August [provisional]
EAGxBerlin | Early September [provisional]
EAGxAustralia | Late September [provisional]
EA Global: Boston | Oct 27–Oct 29
EAGxVirtual | November [provisional]
by Holden Karnofsky
Holden Karnofsky (co-ceo of Open Philanthropy) is taking a minimum 3 month leave of absence from March 8th to explore working directly on AI safety, particularly AI safety standards. They may end up doing this full-time and joining or starting a new organization. This is due to believing transformative AI could be developed soon and that they can have more impact with direct work on it, in addition to personal fit towards building multiple organisations over running one indefinitely.
The recent Time article on EA and sexual harassment included a case involving an ‘influential figure in EA’. In this post, Owen Cotton-Barratt confirms that this was him, during an event five years ago. He apologizes and gives full context of what happened from his view, what generalizable mistakes he made that contributed, and what actions he’s taking going forward. This includes resigning from the EV UK board and pausing other activities which may give him power in the community (eg. starting mentoring relationships, organizing events, or recommending projects for funding).
Owen’s behavior was reported to Julia Wise (CEA’s community liaison) in 2021, who shared it with the EV UK board shortly after the Time article came out. Julia has also apologized for the handling of the situation and shared the actions that were taken at the time this incident was first reported to her, as well as in the time between then and now in the comments. The EV UK board is commissioning an external investigation by an independent law firm into both Owen’s behavior and the Community Health team’s response.
Leadership can fail in 4 ways: bad actors, well-intentioned people with low competence, well-intentioned high-competence people with collective blind spots, or the right group of people with bad practices.
The author argues EA focuses too much on the ‘bad actors’ angle, and this incentivizes boards to hire friends or those they know socially to reduce this risk. They suggest we stop this behavior, and instead tackle the other three risks via:
- Learning from models of leadership in other communities and organisations (to elevate competence at soft skills, since EA has few experienced leaders to learn from).
- Recognizing that seeing one’s own faults is difficult, and being open to external expertise and hiring those dissimilar to ourselves can be good ways to identify these blind spots.
- CEOs and Boards should create the right environment for effective decision-making (eg. CEOs speaking last, creating incentives for employees to be honest, and taking difficult decisions to the board for input. The board avoiding execution, not talking lots where they aren’t independent, and evaluating their own performance and composition).
by Ozzie Gooen
Discusses the specific barrier to feedback of things being uncomfortable to say, and how this affects the availability of criticism between different groups. Specifically, they cover:
- Evaluation in global welfare - criticism seems like fair game, though we don’t tend to evaluate mediocre or poor global health charities.
- Evaluation in longtermism - most potential evaluators are social peers of who they would be evaluating. There is some discussion, but discomfort is one bottleneck to more.
- Evaluation of EA Funders / Leaders by Community Members - there are only a few EA funders, and they’re highly correlated in opinion, making those who could give the best critiques uncomfortable doing it. Those with less to lose have voiced critiques.
- Evaluation of Community Members by Funders / Leaders - criticizing those you have power over is seen as ‘punching down’, and so is rarely done in any community. However, action can still be taken behind the scenes. This combination is really bad for trust.
- Evaluation of adjacent groups, by EAs - EA has sometimes been dismissive about other groups without public clarity on why. It has taken a public image as cooperative and respectful, which makes it tricky to openly disagree with other groups.
- Evaluation of EA, by adjacent groups - some are comfortable critiquing EA, but many of the most informative voices have no incentives to.
- Evaluation of EA critics, by EAs - honestly responding when you think a critique is really bad is surprisingly hard to do. It can look like punching down, or like you’re biased, and can require a lot of emotional energy.
They suggest EA look at specific gaps in feedback, and from which groups - as opposed to asking ‘are we open to feedback?’ more generally.
by Patrick Sue Domin
The author suggests considering not “sleeping around” (eg. one night stands, friends with benefits, casual dating with multiple people) within the EA community, due to its tight-knit nature increasing the associated risks. For instance, someone who is pursued and declined may end up having to interact with the pursuer in professional capacities down the road. They suggest this is particularly the case for those with any of the following additional risk factors: high-status within EA, and/or a man pursuing a woman, and/or socially clumsy. There is a large amount of discussion on both sides in the comments.
by Jeff Kaufman
In response to some of the above posts, there’s been a lot of discussion on how much EA culture did or didn’t contribute. Some of the suggestions (eg. discouraging polyamory or hookups) have caused others to argue what happens between consenting adults is no-one’s else’s business.
The author argues consent isn’t always enough, particularly in cases with imbalanced power (eg. grantee and grantmaker). Organisations handle these conflicts in ways such as requiring a professor to either resign or not pursue a relationship with a student, or a grantmaker to disclose and recuse themselves from responsibilities relating to evaluating a grantee they’re in a relationship with. This is pretty uncontroversial and shows the question is what norms we should have and not whether it is legitimate at all to have norms beyond consent.
A selection of posts that don’t meet the karma threshold, but seem important or undervalued.
Fill out this survey with your thoughts on community posts having their own section on the frontpage, what changes to the forum site you’d like to see, what conversations you’d like to see, or any other feedback.
The Mental Health Funder’s Circle supports organisations working on cost-effective and catalytic mental health interventions. It held its first grant round in the Fall/Winter of 2022, and has now distributed $254K total to three organisations (Vida Plena for community mental health in Ecuador, Happier Lives Institute for work on subjective well being and cause prioritization, and Rethink Wellbeing to support mental health initiatives in the EA community). The next round of funding is now open - apply here by April 1st.
by Jan_Kulveit, rosehadshar
Suggests domains may move through three stages. Current day examples in brackets:
1. Human period - humans more powerful than AI (alignment research, business strategy).
2. Cyborg period - human + AI more powerful than humans or AIs individually (visual art, programming).
3. AI period - AIs more powerful than humans, and approx. equal to human+AI teams. (chess, shogi).
Transitions into the cyborg period will be incredibly impactful in some domains eg. research, human coordination, persuasion, cultural evolution. Which domains transition first also lends itself to different threat models and human response. For instance, moving faster on automating coordination relative to automating power, or on automating AI alignment research relative to AI research in general, could both reduce risk.
They also argue cyborg periods may be brief but pivotal, involving key deployment decisions and existential risk minimization work. To make the best use of them, we’ll need to have sufficient understanding of AI system’s strengths and weaknesses, novel modes of factoring cognition, modify AI systems towards cyborg uses, and practice working in human+AI teams in existing cyborg domains.