2018 AI Alignment Literature Review and Charity Comparison

Larks

2018 AI Alignment Literature Review and Charity Comparison

Larks

75 min readDec 18, 2018

118

Comments 28

Sorted by

New & upvoted

Ben_West🔸

Thanks Ben! Great as always. One quibble:

As such, in general I do not give full credence to charities saying they need more funding because they want more than a year of runway in the bank. A year’s worth of reserves should provide plenty of time to raise more funding.

12 months of runway means that an organization with an annual fundraising drive will be near bankruptcy once per year, every year. That seems bad.
I agree that most organizations should be able to raise money in less than 12 months if they are 100% focused on raising money, but not having to worry about fundraising seems pretty valuable to me. An 18 month runway means, for example, that if your annual fundraising drive goes poorly you still have 6+ months to find some solution.

Larks

My general model is that charities get funding in two waves:

1) December

2) The rest of the year

As such, if I ask groups for their runway at the beginning of 1), and they say they have 12 months, that basically means that even if they failed to raise any money at all in the following 1) and 2) they would still survive until next December, at which point they could be bailed out.

However, I now think this is rather unfair, as in some sense I'm playing donor-of-last-resort with other December donors. So yes, I think 18 months may be a more reasonable threshold.

Kaj_Sotala

In the past [EAF/FRI] have been rather negative utilitarian, which I have always viewed as an absurd and potentially dangerous doctrine. If you are interested in the subject I recommend Toby Ord’s piece on the subject. However, they have produced research on why it is good to cooperate with other value systems, making me somewhat less worried.

(I work for FRI.) EA/FRI is generally "suffering-focused", which is an umbrella term covering a range of views; NU would be the most extreme form of that, and some of us do lean that way, but many disagree with it and hold some view which would be considered much more plausible by most people (see the link for discussion). Personally I used to lean more NU in the past, but have since then shifted considerably in the direction of other (though still suffering-focused) views.

Besides the research about the value of cooperation that you noted, this article discusses reasons why the expected value of x-risk reduction could be positive even from a suffering-focused view; the paper of mine referenced in your post also discusses why suffering-focused views should care about AI alignment and cooperate with others in order to ensure that we get aligned AI.

And in general it's just straightforwardly better and (IMO) more moral to try to create a collaborative environment where people who care about the world can work together in support of their shared points of agreement, rather than trying to undercut each other. We are also aware of the unilateralist's curse, and do our best to discourage any other suffering-focused people from doing anything stupid.

Sean_o_h

" It is possible they had timing issues whereby a substantial amount of work was done in earlier years but only released more recently. In any case they have published more in 2018 than in previous years. "

(Disclosure: I am executive director of CSER) Yes. As I described in relation to last year's review, CSER's first postdoc started in autumn 2015, most started in mid 2016. First stages of research and papers began being completed throughout 2017, most papers then going to peer-reviewed journals. 2018 is more indicative of run-rate output, although 2019 will be higher.

Throughout 2016-2017, considerable CSER leadership time (mine in particular) has also gone on getting http://lcfi.ac.uk/ up and running, which will increase our output on AI safety/strategy/governance (although CFI also separately works on near term and non-AI safety-related topics).

Thank you for another detailed review!

Sean_o_h

(Disclosure: I am executive director of CSER) Thanks again for a wide-ranging and helpful review; this represents a huge undertaking of work and is a tremendous service to the community. For the purpose of completeness, I include below 14 additional publications authored or co-authored by CSER researchers for the relevant time period not covered above (and one that falls just outside but was not previously featured):

Global catastrophic risk:

Ó hÉigeartaigh. The State of Research in Existential Risk

Avin, Wintle, Weitzdorfer, O hEigeartaigh, Sutherland, Rees (all CSER). Classifying Global Catastrophic Risks

International governance and disaster governance:

Rhodes. Risks and Risk Management in Systems of International Governance.

Biorisk/bio-foresight:

Rhodes. Scientific freedom and responsibility in a biosecurity context.

Just missing the cutoff for this review but not included last year, so may be of interest is our bioengineering horizon-scan. (published November 2017). Wintle et al (incl Rhodes, O hEigeartaigh, Sutherland). Point of View: A transatlantic perspective on 20 emerging issues in biological engineering.

Biodiversity loss risk:

Amano (CSER), Szekely… & Sutherland. Successful conservation of global waterbird populations depends on effective governance (Nature publication)

CSER researchers as coauthors:

(Environment) Balmford, Amano (CSER) et al. The environmental costs and benefits of high-yield farming

(Intelligence/AI) Bhatagnar et al (incl Avin, O hEigeartaigh, Price): Mapping Intelligence: Requirements and Possibilities

(Disaster governance): Horhager and Weitzdorfer (CSER): From Natural Hazard to Man-Made Disaster: The Protection of Disaster Victims in China and Japan

(AI) Martinez-Plumed, Avin (CSER), Brundage, Dafoe, O hEigeartaigh (CSER), Hernandez-Orallo: Accounting for the Neglected Dimensions of AI Progress

(Foresight/expert elicitation) Hanea… & Wintle The Value of Performance Weights and Discussion in Aggregated Expert Judgments

(Intelligence) Logan, Avin et al (incl Adrian Currie): Uncovering the Neural Correlates of Behavioral and Cognitive Specialization

(Intelligence) Montgomery, Currie et al (incl Avin). Ingredients for Understanding Brain and Behavioral Evolution: Ecology, Phylogeny, and Mechanism

(Biodiversity) Baynham Herdt, Amano (CSER), Sutherland (CSER), Donald. Governance explains variation in national responses to the biodiversity crisis

(Biodiversity) Evans et al (incl Amano). Does governance play a role in the distribution of invasive alien species?

Outside of the scope of the review, we produced on request a number of policy briefs for the United Kingdom House of Lords on future AI impacts; horizon-scanning and foresight in AI; and AI safety and existential risk, as well as a policy brief on the bioengineering horizon scan. Reports/papers from our 2018 workshops (on emerging risks in nuclear security relating to cyber; nuclear error and terror; and epistemic security) and our 2018 conference will be released in 2019.

Thanks again!

Howie_Lempel

Are there a couple you're most proud of or are most representative of the type of research CSER is shooting for consistently producing over the next several years? Are there any you'd point to as "CSER would have accomplished an optimistic win scenario if it can keep up a steady stream of papers like X?"

Sean_o_h

Thanks! I think in terms of insight that 'Classifying global catastrophic risks' presents a novel way of thinking about GCRs looking at critical systems disrupted and potential systems affected across risks. This could be helpful in both identifying new potential GCRs, and points of intervention. I think there are a number of follow-on pieces of research that could lead from it, and it could also be a good way for people from a number of domains to think about how they can use their expertise to engage with GCR research.

From a methodology point of view, I think the biological horizon-scan ('20 emerging issues in biological engineering') was very successful. While it was aimed more broadly at adapting an expertise-elicitation-and-aggregation technique to anticipating relevant advances and challenges in biological engineering (so only a subset of issues in the final paper are specifically risk), it came together well, and demonstrated proof-of-concept (concept originally being that this sort of technique could be useful in tech-GCR-relevant foresight). It's been extremely well-received within the research community, was presented at the 2017 Biological Weapons Convention and by invitation at the 2018 Organisation for the Prohibition of Chemical Weapons Scientific Advisory Board meeting. A major issue at the BWC is that the BWC is underfunded and under-supported in various ways, and is struggling to keep up with advances in the science and tech (for more on this and challenges for the BWC, see our 2017 report here). Participants in the 2017 meeting commented that exercises like our horizon-scan were very useful to the BWC for that reason.

I'd like to see us do more of this, including drilling down more on emerging biothreats more specifically, and applying the technique (and similar) to other risk/emerging tech domains.

Re: papers in the original review, I was very pleased with how the Malicious AI report turned out. It resulted in a landmark report that has significantly influenced the conversation. And while the topics were more near-term AI, it provided an opportunity to introduce a number of principles that may be influential as we move closer to transformative AI: from issues of responsibility of research leaders, security best practices, to different practices on open-ness with regards to certain types of research, and ideas around monitoring/tracking certain things like hardware.

I was also v pleased with how Natalie Jones (et al)'s paper turned out. Natalie is a PhD student in Cambridge who has been mentored during some of her time here by CSER's Julius Weitzdorfer. What is particularly satisfying here, as the OP pointed to, is that while the paper was being finalised, CSER was able to support Natalie and a team of students in pushing through one of the key recommendations of the paper, and establishing an All-Party Parliamentary Group on Future Generations (for which CSER is playing role of secretariat, and CSER researchers/senior advisors playing advisory role).

So I would say a continued stream of papers that do some combination of the following would be a good scenario: (a) opening up new ways of analysing GCRs (b) developing new methodologies for foresight and anticipating risk or risk-relevant advances (c) producing outputs that are useful for institutions with key roles in managing global risks (d) result in implementable recommendations that we can help to implement (e) introduce concepts in contemporary tech, policy and risk that will be useful for future challenges.

These can be well-complemented by high-quality academic papers chipping away at GCR-relevant issues like biodiversity loss, international risk governance, AI foresight and governance, issues of global ethics and future generations etc in a more incremental/fine-grained fashion. Plus targets of opportunity like emerging areas of risk that aren't quite in anyone's domain to work on properly at present and are thus going under-treated (e.g. a workshop led by Shahar Avin on emerging risks from modernising infrastructure around nuclear command and control, esp looking at the cyber angle, might fall into this category - paper just submitted). And several papers that will be coming out in 2019 will be focused on analysing the evidence base for different claims around Xrisks/GCRs, as well as ways of collecting and aggregating research relevant to Xrisk/GCR across fields, which we think will be helpful in Xrisk/GCR's move towards being a 'mature' field.

Howie_Lempel

Thanks, Seán! This response was incredibly helpful. Looking forward to reading some of these.

Dawn Drescher

Wow! Thank you again for another amazing overview! :-D

With regard to the FRI section: Here is a reply to Toby Ord by Simon Knutsson and another piece that seems related. (And by “suffering focus,” people are referring to something much broader than NU, which may be true of some CUs too.)

[anonymous]

Why is this seemingly reasonable comment at a score of 0 with 8 votes? Am I missing something?

Dawn Drescher

Hmm, yeah, curious as well. Maybe it’s because I link long essays without summarizing them, so people are left wondering whether the essays are relevant enough to be worth reading.

But apart from the link to Simon’s reply, Kaj’s comment is much better than mine anyway.

Milan Griffes

Curious why Distill wasn't included.

Their stuff on interpretability seems like it has implications for alignment, and their work seems high-quality in general.

Larks

No principled reason, other than that this is not really my field, and I ran out of time, especially for work produced outside donate-able organizations. Sorry!

Milan Griffes

I suppose this could be rolled into the Google Brain section, as it looks like most Distill contributors have a Google affiliation.

Ben Pace

+1 Distill is excellent and high-quality, and plausibly has important relationships to alignment. (FYI some of the founders lately joined OpenAI, if you're figuring out which org to put it under, though Distill is probably its own thing).

Aaron Gertler 🔸

This post was awarded an EA Forum Prize; see the prize announcement for more details.

My notes on what I liked about the post, from the announcement:

"2018 AI Alignment Literature Review and Charity Comparison" is an elegant summary of a complicated cause area. It should serve as a useful resource for people who want to learn about the field of AI alignment; we hope it also sets an example for other authors who want to summarize research.
The post isn’t only well-written, but also well-organized, with several features that make it easier to read and understand. The author:
Offers suggestions on how to effectively read the post.
Hides their conclusions, encouraging readers to draw their own first.
Discloses relevant information about their background, including the standards by which they evaluate research and their connections with AI organizations.
These features all fit with the Forum’s goal of “information before persuasion”, letting readers gain value from the post even if they disagree with some of the author’s beliefs.

Linch

Really late to the party, but thanks so much for this great post!

Minor detail: Shah et al.'s Value Learning Sequence should redirect here:

https://www.alignmentforum.org/s/4dHMdK5TLN6xcqtyc

Larks

You're welcome! And thanks, fixed.

agdfoster

>Last year I mentioned that EA Long Term Future Fund did not seem to be actually making grants. After a series of criticism on the EA forum by Henry Stanley and Evan Gaensbauer, CEA has now changed the management of the funds and committed to a regular series of grantmaking. However, I’m skeptical this will solve the underlying problem. Presumably they organically came across plenty of possible grants – if this was truly a ‘lower barrier to giving’ vehicle than OpenPhil they would have just made those grants. It is possible, however, that more managers will help them find more non-controversial ideas to fund.

The last sentence is one of the key reasons it was refreshed. It's also worth noting that I believe the new managers do not have access to large pots of discretionary funding (easier to deploy than EA Funds) that they can use to fund opportunities that they find. I could be wrong about that.

Larks

It's also worth noting that I believe the new managers do not have access to large pots of discretionary funding (easier to deploy than EA Funds) that they can use to fund opportunities that they find.

Good point!

Milan Griffes

Thanks, I found this very helpful!

What process do you use to stay on top of the new literature as it comes out?

I have a rough model of what to do to track organizational output: sign up for newsletters & RSS feeds, check their websites occasionally, ask them if I've missed anything near the end of the year.

I have no idea what to do to track the work coming out of academia (i.e. the stuff in your "Other Research" section) - arxiv seems like a morass to navigate. How do you stay on top of that?

Larks

I'm glad you found it helpful!

I don't have a great system. I combined a few things:

1) Organisations' websites

2) Backtracking from citations in papers, especially those published very recently

3) Author's own websites for some key authors

4) 'cited by' in Google scholar for key papers, like Concrete Problems

5) Asking organisations what else I should read - many do not have up to date websites.

6) Randomly coming accross things on facebook, twitter, etc.

7) Rohin's excelent newsletter.

Rohin Shah

Not the OP, but the Alignment Newsletter (which I write) should help for technical AI safety. I source from newsletters, blogs, Arxiv Sanity and Twitter (though Twitter is becoming more useless over time). I'd imagine you could do the same for other fields as well.

Milan Griffes

Thanks, I was also curious about how you sourced the newsletter :-)

Why do you think Twitter is degrading?

Rohin Shah

Not sure. A few hypotheses:

Arxiv sanity has become better at predicting what I care about as I've given it more data. I don't think this is the whole story because the absolute number of papers I see on Twitter has gone down.
I did create my Twitter account primarily for academic stuff, but it's possible that over time Twitter has learned to show me non-academic stuff that is more attention-grabbing or controversial, despite me trying not to click on those sorts of things.
Academics are promoting their papers less on Twitter.

jtm

Hi! What a comprehensive review, thanks for writing it up!

One quibble is that the OP is very dismissive of the issue of biases, discrimination, and AI.

While I don't necessarily think that this issue should fall under the category of AI alignment that people in the EA community normally are concerned with, I also believe that it is inappropriate to completely dismiss it. So, I just wanted to add a comment saying that some of us in the community are concerned about biases and AI, and I hope the EA community will being having a healthy discussion about it.

Cheers!

Aaron Gertler 🔸

Have you seen any study/analysis (even a solid Fermi estimate) showing that AI bias, either similar to variants identified so far or hypothetical future variants, could plausibly be sufficiently large*tractable to be worthy of further investigation?

I've always grouped this issue in the large category of "issues that are bad and should be worked on by someone, but that get plenty of coverage in the non-EA world and don't seem especially compelling for our tiny community to look at". AI bias gets a lot of attention from large tech firms and large media companies relative to long-term concerns about safety/alignment.

jtm

Hey Aaron!

So, I think we agree and I may have been unclear in my comment. I didn't mean to imply that the problem of AI bias necessarily is large/neglected/tractable enough that the EA community should be very preoccupied with it.

The reason I commented was that I read OP's paragraph to not only say 'bias isn't the kind of thing that the EA community should focus on' but rather something much more bold, i.e. 'bias isn't a problem at all'.

And I quite confidently and strongly disagree with the latter claim.

-Joshua from YEA.

Comments

2018 AI Alignment Literature Review and Charity Comparison

2018 AI Alignment Literature Review and Charity Comparison

Introduction

Methodological Considerations

Track Records

Politics

Openness

Research Flywheel

Near vs Far Safety Research

Autonomous Cars

Unemployment

Bias

Other Existential Risks

Financial Reserves

Donation Matching

Poor Quality Research

The Bay Area

Organisations and Research

MIRI: The Machine Intelligence Research Institute

Research

Non-disclosure policy

Finances

FHI: The Future of Humanity Institute

Research

Finances

CHAI: The Center for Human-Compatible AI

Research

Finances

CSER: The Center for the Study of Existential Risk

Research

Finances

GCRI: Global Catastrophic Risks Institute

Research

Finances

GPI: The Global Priorities Institute

Research

Finances

ANU: Australian National University

Research

Finances

BERI: The Berkeley Existential Risk Initiative

Ought

Research

Finances

AI Impacts

Research

Finances

Open AI

Research

Finances

Google Deepmind

Research

Finances

Google Brain

Research

Finances

EAF / FRI: The Effective Altruism Foundation / Foundational Research Institute

Finances

Foresight Institute

FLI: The Future of Life Institute

Median Group

Research

Convergence Analysis

Other Research

Papers

Books

Misc other news

Conclusions

Disclosures

Sources