What Should the Average EA Do About AI Alignment?

Raemon

I'm trying to get a handle on what advice to give people who are convinced AI is a problem worthy of their time, *probably* the most important problem, but are not sure if they have the talent necessary to contribute.

A trending school of thought is "AI Alignment needs careful, clever, agenty thinkers. 'Having the correct opinion' is not that useful. There is nobody who can tell you what exactly to do, because nobody knows. We need people who can figure out what to do, in a very messy, challenging problem."

This sort of makes sense to me, but it seems like only a few sorts of people can realistically contribute in this fashion (even given growth mindset considerations). It also seems like, even if most people could contribute, it doesn't provide very good next-actions to people who have reached the "okay, this is important" stage, but who aren't (yet?) ready to change their career direction.

Here is the advice I currently give, followed by the background assumptions that prompted it. I'm looking for people to challenge me on any of these:

Options for the non-or-minimally-technical-ish:

1) Donate. (1%, or more if you can do so without sacrificing the ability to take valuable financial risks to further your career. MIRI, FHI, 80k and CFAR seem like the most credible ways to turn money into more AI Alignment career capital)

2) Arrange your life such that you can easily identify volunteer opportunities for gruntwork, operations, or other nontechnical skills for AI safety orgs, and dedicate enough time and attention to helping with that gruntwork that you are more of an asset than a burden. (i.e. helping to run conferences and workshops). To help with AI specific things, it seems necessary to be in the Bay, Boston, Oxford, Cambridge or London.

3a) Embark on projects or career paths that will cause you to gain deep skills, and in particular, train the habit/skill of noticing things that need doing, and proactively developing solutions to accomplish them. (These projects/careers can be pretty arbitrary. To eventually tie them back into AI, you need to get good enough that you'll either be able help found a new org or provide rare skills to an existing org)

3b) Ideally, choose projects that involve working together in groups, that require you to resolve differences in opinion on how to use scarce resources, and which require you to interacting with other groups with subtly different goals. Practice coordination skills mindfully.

4) Provide a reading list of blogs and social-media feeds to stay up-to-date on the more accessible, less technically demanding thoughts relating to AI Safety. Practice thinking critically on your own about them. (this doesn't really come with an obvious "Part 2" that translates that into meaningful action on its own)

If technical-ish, and/or willing to learn a LOT

5) Look at the MIRI and 80k AI Safety syllabus, and see if how much of it looks like something you'd be excited to learn. If applicable to you, consider diving into that so you can contribute to the cutting edge of knowledge.

6) If you're a talented programmer, learn a lot about ML/Deep Learning and then stay up to date on the latest actual AI research, so you can position yourself at the top AI companies and potentially have influence with them on which direction they go.

An important question I'd like to answer is "how do can you tell if it makes sense to alter your career in pursuit of #5 and #6?"? This is very non-obvious to me.

I talk to a lot of people that seem roooooughly analagous to myself, ie. pretty smart but not extremely smart. In my case I think I have a credible claim on "community building" being my comparative advantage, but I notice a lot of people default to "be a community person or influencer", and I'm really wary of a decision tree that outputs a tower of meta-community-stuff for anyone who's not obviously expert at anything else. I'd like to have better, fleshed out, scalable suggestions for people fairly similar to me.

Background assumptions

Various things that fed into the above recommendations (sometimes directly, sometimes indirectly). This is a living document that I'll update as people persuade me otherwise. Again, appreciate getting challenged on any of these.

AI Timelines and Goals

AI timelines are anywhere between 5 years (if DeepMind is more advanced than they're telling anyone), 20 years (if it turns out general AI is only a couple breakthroughs away from current Deep Learning trends, and we're (un)lucky on how soon those breakthroughs come), or much longer if General AI turns out to be harder. We should be prepared for each possibility.

Eventually, all of our efforts will need to translate into the ability into one of the following:

- the ability to develop insights about AI Alignment
- the ability to cause AI research to be safely aligned
- the ability to stop or slow down AI research until it can be safely aligned

Donation

- MIRI seems like the most shovel-ready instance of "actual AI Safety research". It's not obvious to me whether MIRI is doing the best work, but they seem to be at least doing good work, and they do seem underfunded, and funding them seems like the most straightforward way to turn money into more professional AI researchers.

- FHI is a contender for second-best funding-target for X-risk reduction, including some thought about AI alignment.

- 80k, CFAR and Leverage are the orgs I know of that seem to be concretely attempting to solve the "career capital gap", with different strategies. They each have elements that seem promising to me. I'm sure what their respective funding constraints are. (Note: I recently became a bit more interested in Leverage than I had been, but examining Leverage is a blogpost unto itself and I'm not going to try doing so here)

- The Far Future Fund (recently announced, run by Nick Beckstead) may be a good way to outsource your donation decision.

Career Capital, Agency and Self Improvement

- An important limiting reagent is "people able to be agents." More than any single skillset, we need people who are able to look at organizations and worldstates, figure out what's not being done yet, figure out if they currently have the skills to do it, and backchain from that to being able to become the sort of people who have the skills to do that.

- To self-improve the fastest, as a person and as an org, you need high quality feedback loops.

- In my experience, there is a critical threshold between an "agent" and a non-agent. People get activated as agents when they a) have a concrete project to work on that seems important to them that's above their current skill level, and b) have some high status mentor-figure who takes time out of their day to tell them in a serious voice "this project you are working on is important." (The latter step is not necessary but it seems to help a lot. Note: this is NOT a mentor figure who necessarily spends a lot of time training you. They are Gandalf, telling you your mission is important and they believe in you, and then mostly staying out of the way)

(Actual longterm mentorship is also super helpful but doesn't seem to be the limiting issue)

- Beyond "be an agent", we do need highly skilled people at a variety of specific skills - both because AI Safety orgs need them, and because high skill allows you to get a job at an AGI research institution.

- Despite attempting to achieve this for several years, it's not obvious that CFAR has developed the ability to produce agents, but it's succeeded (at least slightly) at attracting existing agents, training them in some skills, and focusing them on the right problems.

Thinking Critically

- We need people who can think critically, and who spend time/attention being able to think critically and deeply about the right things.

- Thinking usefully critically requires being up to speed on what other people are thinking, so you aren't duplicating work.

- It is currently very hard to keep up with ALL the different developments across the AI/EA/Career-Capital-Building spaces. Both because the updates come from all over the internet (and sometimes in person), and because people's writing is often verbose and inconcise.

- It is possible for the average EA to learn to think more critically, but it requires significant time investment

Coordination

- Coordination problems are extraordinarily hard. Humanity essentially failed the "Nuclear Weapons test" (i.e. we survived the Cold War, but we easily might not have. Squeaking by the with a C- is not acceptable).

- Some people have argued the AI problem is much harder than Nukes, which isn't clear to me, (in the longterm you do need to stop everyone ever from developing unsafe AI, but it seems like the critical period is the window wherein AGI is first possible, where it'll be something like 6-20 companies working on it at once)

- The Rationality and EA communities aren't obviously worse than the average community at coordination, but they are certainly not much better. And EAs are definitely not better than-average at inducing coordination/cooperation among disparate groups with different goals that aren't aligned with us.

- If your goal is to influence orgs or AGI researchers, you need to make sure you're actually following a path that leads to real influence. (i.e. "You can network your way into being Elon Musk's friend who he invites over for dinner, but that doesn't mean he'll listen to you about AI safety. The same goes for networking your way onto the GoogleBrain team or the Google AI Ethics board. Have a clear model of influence and how much of it you credibly have.")

-Mainstream politics is even harder than coordinating corporations, and to a first approximation is useless for purposes of AI alignment.

Open Questions

This is mostly a recap.

0) Is anything in my framework grossly wrong?

1) My primary question is "how do we filter for people who should consider dropping everything and focusing on the technical aspects of AI Safety, or seriously pursue careers that will position them to influence AGI research institutions?" These seem like the most important things to actually output, and it seems most important for those people to cultivate particular types of critical thinking, technical skill and ability-to-influence.

For people who are not well suited, or not yet ready to do 1), how can we either:

2) Make it easier for them to translate marginal effort into meaningful contribution, or creating a clearer path towards:

3) Level up to the point where they are able to take in the entire field, and generate useful things to do (without requiring much effort from other heavily involved people whose time is scarce).

Potential Further Reading

I have not read all of these, so cannot speak to which are most important, but I think it's useful to at least skim the contents of each of them so you have a rough idea of the ideas at play. I'm including them here mostly for easy reference.

(If someone wanted to generate a 1-3 sentence summary of each of these and indicate who the target audience is, I'd be happy to edit that in. I hopefully will eventually have time to do that myself but it may be a while)

MIRI's Research Guide

80,000 Hours AI Safety Syllabus

UC Berkeley Center for Human Compatible AI Bibliography

Case Study of CFAR's Effectiveness

AI Impacts Timelines and Strategies (examples of how to think strategically given different AI timelines)

Concrete Problems in AI Safety

OpenAI's Blog

AgentFoundations.org (this is sort of a stack-overflow / technical discussion forum for discussing concepts relevant to AI alignment)

Deliberate Grad School

https://vkrakovna.wordpress.com/2016/02/28/introductory-resources-on-ai-safety-research/

42 Reactions

More posts like this

Comments39

Sorted by

New & upvoted

Click to highlight new comments since: Today at 11:58 AM

John_MaxwellFeb 28 20177

A trending school of thought is "AI Alignment needs careful, clever, agenty thinkers. 'Having the correct opinion' is not that useful. There is nobody who can tell you what exactly to do, because nobody knows. We need people who can figure out what to do, in a very messy, challenging problem."

In some cases, such 'agentlike' people may have more ideas for things to do than they have time in which to do them. See e.g. this list of AI strategy research projects that Luke Muehlhauser came up with.

Broadly speaking, it seems like generating ideas for things to do, evaluating the likely usefulness of tasks, and executing tasks could in principle all be done by different people. I'm not sure I know of any distributed volunteer organizations that work this way in practice, though. Perhaps we could have a single person whose job it is to collect ideas for things to do, run them by people who seem like they ought to be able to evaluate the ideas, and get in touch with people who want to contribute.

People might also be more motivated to work on ideas they came up with themselves.

In terms of influencing top AI companies, I'd be interested to hear thoughts on the best way to handle groups like Facebook/Baidu where the lab's leader has publicly expressed skepticism about the value of AI safety research. One possible strategy is to practice talking to AI safety research skeptics in a lower-stakes context (e.g. at AI conferences) and focus on people like Andrew Ng only when you're relatively sure your advocacy won't backfire.

Richard_BattyFeb 28 20178

I think we have a real problem in EA of turning ideas into work. There have been great ideas sitting around for ages (e.g. Charity Entrepreneurship's list of potential new international development charities, OpenPhil's desire to see a new science policy think tank, Paul Christiano's impact certificate idea) but they just don't get worked on.

John_MaxwellMar 1 20178

Brainstorming why this might be the case:

Lack of visibility. For example, I'm pretty into EA, but I didn't realize OpenPhil wanted to see a new science policy think tank. Just having a list of open projects could help with visibility.
Bystander effects. It's not clear who has a comparative advantage to work on this stuff. And many neglected projects aren't within the purview of existing EA organizations.
Risk aversion. Sometimes I wonder if the "moral obligation" frame of EA causes people to shy away from high-risk do-gooding opportunities. Something about wanting to be sure that you've fulfilled your obligation. Earning to give and donating to AMF or GiveDirectly becomes a way to certify yourself as a good person in the eyes of as many people as possible.
EA has strong mental handles for "doing good with your donations" and "doing good with your career". "Doing good with your projects" is a much weaker handle, and it competes for resources with the other handles. Speculative projects typically require personal capital, since it's hard to get funding for a speculative project, especially if you have no track record. But if you're a serious EA, you might not have a lot of personal capital left over after making donations. And such speculative projects typically require time and focus. But many careers that are popular among serious EAs are not going to leave much time and focus for personal projects. I don't see any page on the 80K website for careers that leave you time to think so you can germinate new EA organizations in your spare time. Arguably, the "doing good with your career" framing is harmful because it causes you to zoom out excessively instead of making a series of small bets.
Lack of accountability. Maybe existing EA organizations are productive because the workers feel accountable to the leaders, and the leaders feel accountable to their donors. In the absence of accountability, people default to browsing Facebook instead of working on projects. Under this model, using personal capital to fund projects is an antipattern because it doesn't create accountability the way donations do. Another advantage of EAs donating money to each other is that charitable donations can be deducted from your taxes, but savings intended for altruistic personal projects cannot be. But note that accountability can have downsides.
It's not that there is some particular glitch in the process of turning ideas into work. Rather, there is no process in the first place. We can work to identify and correct glitches once we actually have a pipeline.

If someone made it their business to fix this problem, how might they go about it? Brainstorming:

Secure seed funding for the project, then have a competitive application process to be the person who starts the organization. Advantages: Social status goes to the winner of the application process. Comparing applicants side-by-side, especially using a standard application, should result in better hiring decisions/better comparative advantage mapping. Project founders can be selected more on the basis of project-specific aptitude and less on the basis of connections/fundraising ability. If the application process is open and widely advertised (the way e.g. Y Combinator does with their application), there's the possibility of selecting talented people outside the EA community and expanding our effective workforce. Disadvantages: Project founders less selected for having initiative/being agentlike/being really passionate about this particular project?
Alternatively, one can imagine more of a "headhunter" type approach. Maybe someone from the EA funds looks through the EA rolodex and gets in contact with people based on whether they seem like promising candidates.
Both the competitive application approach and the headhunter approach could also be done with organizations being the unit that's being operated on rather than individuals. E.g. publicize a grant that organizations can apply for, or contact organizations with a related track record and see if they'd be interested in working on the project if given the funding. Another option is to work through universities. In general, I expect that you're able to attract higher quality people if they're able to put the name of a prestigious university on their resume next to the project. The university benefits because they get to be associated with anything cool that comes out of the project. And the project has an easier time getting taken seriously due to its association with the university's brand. So, wins all around.
Some of these projects could work well as thesis topics. I know there was a push a little while ago to help students find EA-related thesis topics that ended up fading out. But this seems like a really promising idea to me.

Richard_BattyMar 2 201713

This is really helpful, thanks.

Whilst I could respond in detail, instead I think it would be better to take action. I'm going to put together an 'open projects in EA' spreadsheet and publish it on the EA forum by March 25th or I owe you £100.

John_MaxwellMar 4 20174

£100... sounds tasty! I'll add it to my calendar :D

Richard_BattyMar 25 201711

No tasty money for you: http://effective-altruism.com/ea/18p/concrete_project_lists/

John_MaxwellMar 27 20171

Nice work!!

Denkenberger🔸Mar 8 20171

Isn't this list of ideas in need of implementation similar?

Daniel_DeweyFeb 27 20177

Re: donation: I'd personally feel best about donating to the Long-Term Future EA Fund (not yet ready, I think?) or the EA Giving Group, both managed by Nick Beckstead.

TaraMacAulayFeb 28 20177

The EA Funds are now live and accepting donations. You can read about the Far Future fund here.

Benjamin_ToddFeb 28 20176

It might also be useful to link to this: https://80000hours.org/problem-profiles/artificial-intelligence-risk/

And we're currently working on a significant update.

AlexMennenFeb 27 20173

5) Look at the MIRI and 80k AI Safety syllabus, and see if how much of it looks like something you'd be excited to learn. If applicable to you, consider diving into that so you can contribute to the cutting edge of knowledge. This may make most sense if you do it through

...

RaemonFeb 27 20172

Thanks, fixed. I had gotten partway through updating that to say something more comprehensive, decided I needed more time to think about it, and then accidentally saved it anyway.

lifelonglearnerFeb 26 20173

Thank you for writing this up. I haven't spent cycles thinking this through, but my first glance says that this hits a lot of obvious avenues, which seems good.

I think I had a disjoint model of most of the things above, but it was all scattered and not consolidated. Putting them together (so that learning more, coordination, donating, gruntwork are all here) was a good way for me to update my own thoughts.

lifelonglearnerFeb 26 20173

I'm sure what their respective funding constraints are.

Should there be a "not" in the middle here, or are you just saying that you have good info on their funding situation?

RaemonFeb 26 20172

Heh, correct. Will update soon when I have a non phone to do it.

HaydnBelfieldMar 2 20172

Whatever happened to EA Ventures?

Richard_BattyMar 2 20173

See http://effective-altruism.com/ea/174/introducing_the_ea_funds/a2m?context=1#a2m

JoshuaFoxFeb 27 20172

Outreach can be valuable, although it is rare to have high-value opportunities. If you can publish, lecture or talk 1-on-1 with highly relevant audiences, then you may sway the Zeitgeist a little and so contribute towards getting donors or researchers on board.

Relevant audiences include:

tech moguls and other potential big donors; people who may have the potential to become or at influence those moguls.
researchers in relevant areas such as game theory; smart people in elite educational tracks who may have the potential to become or influence such researchers.

jsteinhardtFeb 28 201711

I already mention this in my response to kbog above, but I think EAs should approach this cautiously; AI safety is already an area with a lot of noise, with a reputation for being dominated by outsiders who don't understand much about AI. I think outreach by non-experts could end up being net-negative.

kbogFeb 28 20171

It is very different for 1-on-1 engagement with highly relevant audiences than it is for general online discourse.

RaemonFeb 28 20170

I agree with this concern, thanks. When I rewrite this post in a more finalized form I'll include reasoning like this.

kbogFeb 27 20172

What about online activism? There are lots of debates in various corners of the Internet over AI which often involve people in various areas of academia and tech. It seems like it could be valuable and feasible for people who are sufficiently educated on the basic issues of AI alignment to correct misconceptions and spread good ideas.

As another idea, there are certain kinds of information which would be worth collecting: surveys of relevant experts, taxonomies of research ideas and developments in the field, information about the political and economic sides of AI research. I suppose this could fall into gruntwork for safety orgs, but they don't comprehensively ask for every piece of information and work which could be useful.

Also - this might sound strange, but if someone wants to contribute then it's their choice: students and professionals might be more productive if they had remote personal assistants to handle various tasks which are peripheral to one's primary tasks and responsibilities, and if someone is known to be an EA, value aligned on cause priorities, and moderately familiar with the technical work, then having someone do this seems very feasible.

jsteinhardtFeb 28 201712

In general I think this sort of activism has a high potential for being net negative --- AI safety already has a reputation as something mainly being pushed by outsiders who don't understand much about AI. Since I assume this advice is targeted at the "average EA" (who presumably doesn't know much about AI), this would only exacerbate the issue.

kbogFeb 28 20170

It depends on the context. In many places there are people who really don't know what they're talking about and have easily corrected, false beliefs. Plus, most places on the Internet protect anonymity. If you are careful it is very easy to avoid having an effect that is net negative on the whole, in my experience.

RaemonMar 1 20175

While I didn't elaborate on my thoughts in the OP, essentially I was aiming to say "if you'd like to play a role in advocating for AI safety, the first steps are to gain skills so you can persuade the right people effectively. I think some people jump from "become convinced that AI is an issue" to "immediately start arguing with people on the internet".

If you want to do that, I'd say it's important to:

a) gain a firm understanding of AI and AI safety, b) gain an understanding common objections and modes of thought surrounding those objections. b) practice engaging with people in a way that actually has a positive impact (do this practice on lower-stakes issues, not AI). My experience is that positive interactions involve a lot of work and emotional labor.

(I still argue occasionally about AI on the internet and I think I've regretted it basically every time)

I think it makes more sense to aim for high-impact influence, where you cultivate a lot of valuable skills that gets you hired at actual AI research firms where you can then shape the culture in a way that prioritizes safety.

kbogMar 1 20171

I think you're mostly right, but there is a difference between arguing in order to convince the other person (what you seem to be focused on) and arguing to convince third party observers and signal the strength of your own position (what I had in mind). The latter seems to be less knowledge-intensive.

sdspikesMar 1 20171

As a Stanford CS (BS/MS '10) grad who took AI/Machine Learning courses in college from Andrew Ng, worked at Udacity with Sebastian Thrun, etc. I have mostly been unimpressed by non-technical folks trying to convince me that AI safety (not caused by explicit human malfeasance) is a credible issue.

Maybe I have "easily corrected, false beliefs" but the people I've talked to at MIRI and CFAR have been pretty unconvincing to me, as was the book Superintelligence.

My perception is that MIRI has focused in on an extremely specific kind of AI that to me seems unlikely to do much harm unless someone is recklessly playing with fire (or intentionally trying to set one). I'll grant that that's possible, but that's a human problem, not an AI problem, and requires a human solution.

You don't try to prevent nuclear disaster by making friendly nuclear missiles, you try to keep them out of the hands of nefarious or careless agents or provide disincentives for building them in the first place.

But maybe you do make friendly nuclear power plants? Not sure if this analogy worked out for me or not.

Paul_ChristianoMar 1 20178

You don't try to prevent nuclear disaster by making friendly nuclear missiles, you try to keep them out of the hands of nefarious or careless agents or provide disincentives for building them in the first place.

The difficulty of the policy problem depends on the quality of our technical solutions: how large an advantage can you get by behaving unsafely? If the answer is "you get big advantages for sacrificing safety, and a small group behaving unsafely could cause a big problem" then we have put ourselves in a sticky situation and will need to conjure up some unusually effective international coordination.

A perfect technical solution would make the policy problem relatively easy---if we had a scalable+competitive+secure solution to AI control, then there would be minimal risk from reckless actors. On the flip side, a perfect policy solution would make the technical problem relatively easy since we could just collectively decide not to build any kind of AI that could cause trouble. In reality we are probably going to need both.

(I wrote about this here.)

You could hold the position that the advantages from building uncontrolled AI will predictably be very low even without any further work. I disagree strongly with that and think that it contradicts the balance of public argument, though I don't know if I'd call it "easily corrected."

capybaraletMar 1 20173

I'm also very interested in hearing you elaborate a bit.

I guess you are arguing that AIS is a social rather than a technical problem. Personally, I think there are aspects of both, but that the social/coordination side is much more significant.

RE: "MIRI has focused in on an extremely specific kind of AI", I disagree. I think MIRI has aimed to study AGI in as much generality as possible and mostly succeeded in that (although I'm less optimistic than them that results which apply to idealized agents will carry over and produce meaningful insights in real-world resource-limited agents). But I'm also curious what you think MIRIs research is focusing on vs. ignoring.

I also would not equate technical AIS with MIRI's research.

Is it necessary to be convinced? I think the argument for AIS as a priority is strong so long as the concerns have some validity to them, and cannot be dismissed out of hand.

John_MaxwellMar 1 20172

I'd be interested to read you elaborate more on your views, for what it's worth.

kbogMar 1 20172

Well you're not the kind of person I had in mind. What I see is more of a mix of basic mistakes regarding the technical arguments and downright defamation of relevant people and institutions.

Evaluating whether the MIRI technical agenda is relevant to AI seems pretty thorny and subjective, and perhaps not something that people without graduate-level study can do.

One thing that people can contribute when they find people like you is to figure out the precise reasons for disagreement and document/aggregate them so that they can be reviewed and considered.

Paul_CrowleyFeb 25 20172

Nitpick: "England" here probably wants to be something like "the south-east of England". There's not a lot you could do from Newcastle that you couldn't do from Stockholm; you need to be within travel distance of Oxford, Cambridge, or London.

RaemonFeb 25 20171

Thanks, fixed.

Actually, is anyone other than DeepMind in London? (the section I brought this up was on volunteering, which I assume is less relevant for DeepMind than FHI)

RobBensingerFeb 25 20174

One of the spokes of the Leverhulme Centre for the Future of Intelligence is at Imperial College London, headed by Murray Shanahan.

Sean_o_hFeb 26 20175

There will be a technical AI safety-relevant postdoc position opening up with this CFI spke shortly, looking at trust/transparency/interpretability in AI systems.

RobBensingerFeb 27 20173

... Aaand 33 hours later: https://twitter.com/mpshanahan/status/836249423369756672

Sean_o_hMar 9 20170

Murray will be remaining involved with CFI, albeit at reduced hours. The current intention is that there will still be a postdoc in trust/transparency/interpretability based out of Imperial, although we are looking into the possibility of having a colleague of Murray's supervising or co-supervising.

jyanApr 20 20170

2) Arrange your life such that you can easily identify volunteer opportunities for gruntwork, operations, or other nontechnical skills for AI safety orgs, and dedicate enough time and attention to helping with that gruntwork that you are more of an asset than a burden. (i.e. helping to run conferences and workshops). To help with AI specific things, it seems necessary to be in the Bay, Boston, Cambridge, Oxford or London.

I founded Real AI last December in Hong Kong. Its mission is to ensure that humanity has a bright future with safe and beneficial AGI. Besides the locations listed above, Hong Kong could use some help too.

[This comment is no longer endorsed by its author]Reply