Supported by Rethink Priorities
Sunday August 21st - Saturday August 27th
The amount of content on the EA and LW forums has been accelerating. This is awesome, but makes it tricky to keep up with! The below hopes to help by summarizing popular (>40 karma) posts each week. It also includes announcements and ideas from Twitter that this audience might find interesting. This will be a regular series published weekly - let me know in the comments if you have any feedback on what could make it more useful!
If you'd like to receive these summaries via email, you can subscribe here.
This series originated from a task I did as Peter Wildeford’s executive research assistant at Rethink Priorities, to summarize his weekly readings. If your post is in the ‘Didn’t Summarize’ list, please don’t take that as a judgment on its quality - it’s likely just a topic less relevant to his work. I’ve also left out technical AI posts because I don’t have the background knowledge to do them justice.
My methodology has been to use this and this link to find the posts with >40 karma in a week for the EA forum and LW forum respectively, read / skim each, and summarize those that seem relevant to Peter. Those that meet the karma threshold as of Sunday each week are considered (sometimes I might summarize a very popular later-in-the-week post in the following week’s summary, if it doesn’t meet the bar until then). For twitter, I skim through the following lists: AI, EA, Forecasting, National Security (mainly nuclear), Science (mainly biosec).
I’m going through a large volume of posts so it’s totally possible I’ll get stuff wrong. If I’ve misrepresented your post, or you’d like a summary edited, please let me know (via comment or DM).
Philosophy and Methodologies
Discusses population asymmetry, the viewpoint that a new life of suffering is bad, but a new life of happiness is neutral or only weakly positive. Post is mainly focused on what these viewpoints are and that they have many proponents vs. specific arguments for them. Mentions that they weren’t well covered in Will’s book and could affect the conclusions there.
Presents evidence that people’s intuitions tend towards needing significantly more happy people than equivalent level of suffering people for a tradeoff to be ‘worth it’ (3:1 to 100:1 depending on question specifics), and that therefore a big future (which would likely have more absolute suffering, even if not proportionally) could be bad.
Argues that EAs work across too narrow a distribution of causes given our uncertainty in which are best, and that standard prioritizations are interpreted as more robust than they really are.
As an example, they mention that 80K states “some of their scores could easily be wrong by a couple of points” and this scale of uncertainty could put factory farming on par with AI.
The repugnant conclusion (Parfit, 1984) is the argument that enough lives ‘barely worth living’ are better than a much smaller set of super duper awesome lives. In one description of it, Parfit said the barely worth it lives had ‘nothing bad in them’ (but not much good either).
The post argues that actually makes those lives pretty awesome and non-repugnant, because nothing bad is a high bar.
NB: longer article - only skimmed it so I may have missed some pieces.
Suggestions for cost-effectiveness modeling in EA by a health economist, with Givewell as a case study. The author believes the overall approach to be good, with the following major critiques:
- Extremely severe: no uncertainty modeling - we don’t know how likely they think their recommendations are to be wrong
- Severe: opaque inputs - it’s hard to trace back where inputs to the model come from, or to update them over time
- Moderate: the model architecture could use best practice to be easier to read / understand (eg. separating intervention and moral inputs)
A number of minor issues are also discussed, and the author also does their own CEAs on several top charities and compares them to Givewell's in depth (looking cell by cell). By doing this they find several errors / inconsistencies (eg. Givewell assumes every malaria death prevented by the Malaria Consortium also indirectly prevents 0.5 deaths, but hadn’t applied the same logic to AMF, therefore significantly undercounting AMF’s relative impact). Note overall the feedback is good, and there is more without errors than with.
A rep from Givewell has gotten in touch in the comments to get more detail / consider what to do about the feedback.
Object-Level Interventions / Reviews
Summary of six months research by Social Change Lab on the impacts and outcomes of particularly influential protests / protest movements. The full report is available here. Future research will look at what makes certain movements more effective, and more generally to understand if social movement organizations could be more effective than current well-funded avenues to change.
Headline results include that protest movements often lead to small changes in public opinion & policy, moderate in public discourse, and with large variance between specific cases. Public opinion shifts of 2-10%, voting shifts of 1-6%, and increased discourse of >10x were observed in natural experiments.
Summary & review of grants made in March / April 2022.
Executive summary of an in-depth report on climate change from an LT perspective that was put together as part of What We Owe the Future research.
It states emissions and warming are likely going to be lower than once thought, due to political movement and revisions on the amount of recoverable fossil fuels available to burn. Author gives a 5% chance to >4 degrees warming, not a level that’s an LT risk.
The most LT risk is from reaching ‘tipping points’ (runaway feedback loops). There is high uncertainty in these mechanisms, though they’re unlikely to kick in under 4 degrees, and we know Earth has been >17 degrees hotter in the past and still supported life. If those feedback loops cause significant damage to certain locations, that could in turn cause instability and war. Due to this, he concludes that climate change is still an important LT area - though not as important as some other global catastrophic risks (eg. biorisk), which outsize on both neglectedness and scale.
When interventions are considered only in their primary cause area (eg. global health, animal welfare, existential risk) their impact can be under or over counted by excluding effects in the other cause areas.
Food systems transformation (plant and cell based meat) has competitive positive effects on climate change and biosecurity, in addition to its primary area of animal welfare, so should be rated higher / receive more resources.
The social content strategy manager for Khan Academy is looking for ideas on EA concepts that would be easy to disseminate in ~1m short videos. These would be used to create a series focused on getting young learners interested in EA / improving the world.
EA Giving Tuesday uses Facebook’s Giving Tuesday matching to try and get matched funds to effective orgs. Rethink Charity is stopping support due to lack of capacity, will hibernate if not handed off by Sept 30th this year. $411K was matched in 2021, and it is an operationally complex project to run.
Community & Media
Argues that too much resource currently goes into attracting people into the EA community, as compared to guiding people towards doing more EA-endorsed actions (without the EA affiliation).
For example, the latter could look like influencing existing politicians to care more about preventing pandemics, hosting a talk at a university on the dangers of gain-of-function research, or spreading EA epistemics like scale / neglectedness / tractability frameworks. This allows more scaling of impact as we can influence those who likely wouldn’t join EA.
A practical guide to hiring. Short summary:
- Know the tasks you’re hiring for ie. hire around a role, not a job title.
- Write a good job ad, with a clear title and engaging language, and a specific deadline.
- Post it everywhere relevant, not just one place, and ask specific people for referrals.
- Use structure: app -> test task -> interview. Get task as close to the real job as possible, and drop any tasks or questions that aren’t distinguishing (always get the same answer).
- Consider using a hiring agency.
They also include a linked list of other good resources on hiring.
New intro to EA page is ready to be shared with general audiences, aiming at being the authoritative introductory source.
When doing stuff for fun, don’t worry about it also being productive / good / socially approved - lean into the feeling of “mwahaha I found a way to give myself hedons!”
Mockup image of a newspaper with important stuff on the front page (eg. 15K children died just like every day, and we’re at risk of nuclear war).
Linkpost and summary for Vox’s article which defends EA billionaires as trying to redistribute their income: "If you’re such an effective altruist, how come you’re so rich?"
Reached #7 on hardcover non-fiction list.
Current methods of talent <-> job matching:
- Individual orgs hiring people
- Candidates listing themselves on directories
- Orgs & groups matchmaking or referring people who talk to them
- Dedicated hiring orgs - new in EA, ramping up
- Strategic clarity on biggest hiring needs
- One stop shop CRM of talent for both orgs & candidates (vs. lots of different groups for niche areas) - good & bad aspects to level of centralization here
- Building talent pipelines of strong candidates -> particularly mid-career proto-EAs
- EAs with the skills to be in-house recruiters or external headhunters
*Either because they’re not on the target topic set, because the title is already a solid summary, or because they’re link-posted from LW (and summarized on that list)
Common misconceptions about OpenAI (linkpost, summarized on LW list)
AI Strategy Nearcasting (cross-posted, summarized on LW list)
AI Impacts / New Capabilities
Asks for predictions in the forum thread on what GPT-4 won’t be able to do. Top voted ones include:
- Play a novel, complicated board game well, given only written instructions.
- Understand and manipulate causal relationships framed differently to training data.
- Give good suggestions on improving itself.
AI art isn’t “about to shake things up”. It’s already here.
AI art is cheap and good enough quality to be used for commercial purposes now. Gives the example of Midjourney being 400x cheaper and often better quality for illustrating cards in a card game, as compared to a human artist.
Fictional piece, first person view on a post-AGI utopia.
AI Meta & Methodologies
5 years vs >100 year timelines are both long enough for training / hiring new researchers, and for foundational research to pay off. Because of this, where on this scale timelines fall doesn’t matter for choices on whether to invest in those things.
Summarizes views of others on whether we can use AI to automate alignment research safely.
States three levels - 1) assisting humans (already here), 2) original contributions (arguably here, a little) and 3) building own aligned successor (not here). Lots of disagreement on which are possible or desirable.
Views of specific researchers: (note these are summarized views of summarized views, so might not be great representations of that expert’s opinions)
Nate Soares - building an AI to help with alignment is no easier than building an aligned AI. It would need enough intelligence to already be dangerous.
John Wentworth - Assisting humans is fine (eg. via google autocomplete), but we can’t have AI do the hard parts. We don’t know how close we are to alignment either, because we are still unclear on the problem.
Evan Hubringer - GPT-3 shows we can have programs imitate the process that creates their training data, without goal directed behavior. This could be used to safely produce new alignment research if we can ensure it doesn’t pick up goals.
Ethan Perez - Unclear how easy / hard this is vs. doing alignment ourselves, and if an AI capable of helping would already be dangerous / deceptive. But we should try - build tools that can have powerful AI plugged in when available.
Richard Ngo - AI helping with alignment is essential long-term. But for now do regular research so we can automate once we know how.
Post quality AI research on arXiv instead of just LW and the alignment forum, it’s easy, it’ll show up on google scholar, and it’s likely to be read more broadly.
A list of 26 research outputs Richard Ngo would like to see. (Each expected to be pretty time-consuming).
OpenAI’s roadmap / approach to alignment, cross-posted from their blog. They explain their approach as iterative and empirical - attempting to align real highly capable AI systems, learning and refining methods as AI develops (in addition to tackling problems they assume will be on the path to AGI).
Their primary approach is “engineering a scalable training signal for very smart AI systems that is aligned with human intent” via the following three pillars:
- Training AI systems using human feedback - eg. their approach creating InstructGPT, which is 100x smaller but often preferred to models not trained to follow implicit intent. (Note: still fails sometimes eg. lying or not refusing harmful tasks)
- Training AI systems to assist human evaluation - make it easier for humans to assess other AIs performance on complicated tasks (eg. a human evaluating AI book summaries, an assistant evaluation AI can provide related online links to help accuracy evaluation).
- Training AI systems to do alignment research - train AIs to develop alignment research, and humans to review it (an easier task). No models sufficiently capable to contribute yet.
They also cover some limitations / arguments against this approach.
OpenAI’s post on accurate and inaccurate common conceptions about it.
- OpenAI is looking to directly build a safe AGI
- The majority of researchers work on the capabilities team (100/145)
- The majority did not join explicitly to reduce existential risk (exception - the 30 person alignment team are pretty driven by this)
- There isn’t much interpretability research since Anthropic split off
Inaccurate (I’ll phrase as the accurate versions - the inaccurate ones were the opposite):
- OpenAI has teams focused on both practical alignment of models it’s deployed, and researching how to align AGIs beyond human supervision - not just the former.
- No alignment researchers (other than interpretability ones) moved from OpenAI -> Anthropic. It still has an alignment team.
- OpenAI is not obligated to make a profit.
- OpenAI is aware of race dynamics, and will assist another value-aligned project closer to building AGI if it has a better than even chance within 2 years.
- OpenAI has a governance team and cares about existential risk from AI.
Advocates nearcasting, which is forecasting with the assumption of “a world relatively similar to today’s”. Eg. “what should we do if TAI is just around the corner?”
Gives a simpler jumping off point to start suggesting concrete actions. Ie. if we know an action we’d suggest if TAI were to be developed in a world like today’s, we can ask ‘do we expect differences in the future that will change this suggestion? Which ones?’
Focuses us on near-term TAI worlds - which are most dangerous / urgent.
Allows comparing predictions over time - comparing nearcasts in a given year to those a few years ago gives a feedback loop, showing changing predictions / conclusions and how / why they changed.
In a future post, Holden will be laying out more detail on an example nearcast scenario for TAI and his predictions and recommended actions on it.
Not AI Related
Bulleted list of advice from Katja, based on surveys she’s done. More summarized list below:
- Test your surveys by having people take them & narrate their thoughts as they do.
- Wording matters a lot to results
- Even if you don’t intend it & avoid known issues like desirability bias.
- If you do intend it, there’s heaps of ways to get the result you want.
- Ask people what they know about already (otherwise your summary will bias them), don’t change wording between surveys if doing over time analysis, and avoid sequences of related questions (can lead people to a particular answer).
- Qualtrics is expensive. Guided track isn’t and seems good.
- Surveys are under-rated - do more.
Either because they’re not part of the target topic set, had very technical content, title is already a solid summary, or they were already summarized on the EA forum list.
This Week on Twitter
Stable diffusion was launched publicly 22nd Aug - an open source text-to-image model. Competitive image generation to DALL-E2, even on consumer grade GPUs, and only cost 600K to do the initial training. Relational understanding (eg. ‘red cube on top of green cube’) still shaky. Figma already incorporated it, a few days after release. #stablediffusion will pull up lots of related tweets. Along the same lines, NeRF models can make high quality images from multiple viewpoints via. Separate static images, and are apparently progressing very quickly (link).
Assessing AI moral capacities - DeepMind linked a new paper suggesting a framework for assessing AI moral capacities from a developmental psyc viewpoint. (link)
Adversarial inputs research - Anthropic found they respond best to human-feedback reinforcement learning (for language models, and compared to no additional feedback or to rejection sampling). (link)
Summary of HLI critique on Givewell deworming estimates, and a good response to it by Givewell. -> tweet thread describes the robust back and forth.
Vox article defending EA billionaires - same topic as the forum post in the EA forum section below (link)
“Starlink V2, launching next year, will transmit direct to mobile phones, eliminating dead zones worldwide” could be good for anti-censorship (link)
Ajeya updated her TAI timeline predictions (median 2040). (link)
CLTR highlights that the UK government has committed an £800 million investment for the creation of a new research funding body, the Advanced Research + Invention Agency (ARIA) -> with a high risk / high reward philosophy, and only light governance.
US soon to roll out variant-specific covid booster, first in world. (link)