Effective Altruism Forum
EA Forum

Hide table of contents

Comment Permalink

Evan_GaensbauerMay 8 20192

0

0

An alignment arms race is only bad if there is a concomitant capabilities development that would make a wrong alignment protocol counterproductive. Different approaches to alignment can lead to insights into capabilities, and that's something to be concerned about, but that isn't anything already captured in analyses of capabilities arms-race scenarios.

If there are 2 or more alignment agencies, but only one of their approaches can fit with advanced AI systems as developed, each would race to complete their alignment agenda before the other agencies could complete theirs. This rushing could be especially bad if anyone doesn't take the time to authenticate and verify their approach will actually align AI as intended. In addition, if the competition becomes hostile enough, AI alignment agencies won't be checking each other's work in good faith, and in general, there won't be enough trust for anyone to let anyone else check the work they've done for alignment.

If 1 or more of these agencies racing to the finish line doesn't let anyone check their work, and their strategy is invalid or unsound, then implementing one of them into an AI system would fail to lead to alignment, when it was expected that it would. In other words, because of mistakes made, what looks like an alignment competition inadvertently becomes a misalignment race.

I'm not saying competition in AI alignment is either good or bad by default. What I am saying is it appears there are particular conditions that would lead competition in AI alignment to make things worse, and that such states should be avoided. To summarize, it appears to me at least some of those conditions are:

1. Competition in AI alignment becomes a 'race.'

2. One or more agencies in AI alignment themselves become untrustworthy.

3. Even if in principle all AI alignment agencies should be able to trust each other, in practice they end up mistrusting each other.

How do we check for flaws in Effective Altruism?

by Nathan Young

May 6 20192 min read 24

39

How do we check for flaws in Effective Altruism?

How can we trust the findings of EA organisations? It is a genuine possibility that I will change the entire course of my life as a result of the information on 80k hours. I guess many of you already have. Have you checked all of their reasoning? What percentage ought one to check? Or you trust someone else to have done that "due diligence"?

It's not enough to say they are transparent and seem honest - I know plenty of misguided transparent honest people. The issue is that EA organisations might be wrong, not in what they don't know (we cannot avoid being wrong in what we don't know) but in what they do - like a mistaken mathematical proof, their logic might be flawed or their sums might be off. This, by our own logic, would likely have disastrous results.

Frankly, someone needs to be checking their notes and I am not skilled enough, nor do I want to. I have yet to see this in regard to say 80k hours.

In this sense I can imagine three solutions:

Firstly, some sort of independent auditing body. They could read the works of EA organisations and attempt to see if the logic holds, flag up areas where decisions seem arbitrary etc. We would be paying someone to just be really on top of this stuff as their main job and to tell us if they found anything worrying. Arguably this forum kind of does this job, though A) we are all tremendously biased B) are people *really* checking the minutiae? I am not.

Secondly, multiple organisations independently asking the same questions. What if there were another 80k hours (called say, "Nine Years") which didn't try to interact with them, but sought answers to the same problems. "Nine Years" could publish their research and then we could read both summaries and then investigate areas of difference.

Thirdly, publish papers on our explanations as if they were mathematical (perhaps in philosophy journals). Perhaps this already happens (I guess if this post takes off I might research it more), but you could publish rigid testable explanations of the theories which undergird EA as an ideology. It seems well being (for instance) is very poorly defined. I'll explain more if people are interested, (read Deutsch's The Beginning of Infinity) but suffice it to say that to avoid being wrong you want to be definite so you can change. Is our ideology falsifiable? Sometimes EA seems very vague to me in its explanational underpinnings. If you can vary easily, it's hard to be wrong, and if you're never wrong, you never get better. I don't know if journals are the way to go but it seemed the easiest way to clearly suggest becoming more rigid.

Caveats

I do not know enough about EA - I've read about 20 hours of it in my life. Perhaps mechanisms like this already exist or you have reason to not require them.

I recently left religion and for that reason would like to know that I am not fooling myself here also. "Trust EA organisations because they are good" doesn't hold much water since the logic applies elsewhere - "Trust the Church because it is good"?

Summary

I think it would be good to have a mechanism for ensuring that we are not fooling ourselves here. EA redirects a huge number of person-hours and flaws in it could be catastrophic. I don't know what those are, but have got a few suggestions and am interested in your suggestions or criticisms of the ideas suggested here.

39

0

0

Reactions

0

0

More posts like this

Comments24

Sorted by

Click to highlight new comments since: Today at 3:08 PM

John_MaxwellMay 7 201931

0

0

Donald Knuth is a Stanford professor and world-renowned computer scientist. For years he offered cash prizes to anyone who could find an error in any of his books. The amount of money was only a few dollars, but there's a lot of status associated with receiving a Knuth check. People would frame them instead of cashing them.

Why don't more people do this? Like having a bug bounty program, but for your beliefs. Offer some cash and public recognition to anyone who can correct a factual error you've made or convince you that you're wrong about something. Donald Freakin' Knuth has cut over two thousand reward checks, and us mortals probably make mistakes at a higher rate than he does.

Everyone could do this: organizations, textbooks, newspapers, individuals. If you care about having correct beliefs, create an incentive for others to help you out.

$2 via Paypal to the first person who convinces me this practice is harmful.

KirstenMay 7 20198

0

0

This practice works when applied to a book, but would be harmful if applied to your entire life. I make factual errors all the time - sometimes I'm wrong about what the canteen is serving today or I misremember the capital of Niger - but it's not worth paying people to point it out.

In particular, the admin cost would be very heavy. Imagine you spend five minutes sending someone money every time you notice you made a mistake. You could easily spend 10-30 minutes every day just sending people money. Wouldn't that time be better spent working or reading or sleeping?

In fact, I think you would quickly be incentivized not to say anything you're uncertain about. At best, it would lead to excessive hedging which would make you appear less confident and likely hurt your career. At worst, you'd be so loathe to make a mistake that you wouldn't speak up on a topic you're uncertain about, even if your contributions could help someone.

Nathan YoungMay 7 20195

0

0

In fact, I think you would quickly be incentivized not to say anything you're uncertain about. At best, it would lead to excessive hedging which would make you appear less confident and likely hurt your career. At worst, you'd be so loathe to make a mistake that you wouldn't speak up on a topic you're uncertain about, even if your contributions could help someone.

I think you make solid points, though I think you could limit it to some type of important post and certain types of concepts. eg "only when I state something as true in my blog posts"

Likewise, I often think declaring our uncertainty would be better for us as a species. Learning to should "I don't know" as loudly as the yesses and nos in a debate would I think be helpful to most debates also.

Hauke HillebrandtMay 7 20197

0

0

I love this idea.

Relatedly: if you have a website you're sometimes spammed by bots that tell you that you have grammar mistakes, broken links or that they can help you reduce the loading time of your page.

Perhaps there's a business idea here somewhere, where you 'mass tell' people they've incorrect statistics/beliefs etc. on their website, which they can find behind a $X paywall. Also see: http://statcheck.io/

Nathan YoungMay 7 20195

0

0

This is a clever/fun idea.

$2 via Paypal to the first person who convinces me this is a bad idea.

How much money do you have? How often are you wrong? To what extent do you want people to try and correct you all the time?

Actually I think it's a really good idea and if you try it, let me know how it works out.

Holly Elmore ⏸️ 🔸May 6 201922

0

0

Just person to person, I don't think there's any substitute for staying awake and alert around your beliefs. I don't mean be tense or reflexively skeptical-- I mean accept that there is always uncertainty, so you have to trust that, if you are being honest with yourself and doing your best, you will notice when discomfort with your professed beliefs arises. You can set up external standards and fact checking, but can't expect some external system to do the job for you of knowing whether you really think this stuff is true. People who don't trust themselves on the latter over-rely on the former.

RobBensingerMay 7 201922

0

0

+1 to this.

I partly agree with Nathan's post, for a few reasons:

If Alice believes X because she trusts that Bob looked into it, then it's useful for Alice to note her reason. Otherwise, you can get bad situations like 'Bob did not in fact look into X, but he observes Alice's confidence and concludes that she must have looked into it, so he takes X for granted too and Alice never realizes why'. This isn't a big problem in two-person groups, but can lead to a lot of double-counted evidence in thousand-person groups.
It's important to distinguish 'this feels compelling' from 'this is Bayesian evidence about the physical world'. If an argument seems convincing, but would seem equally convincing if it were false, then you shouldn't actually treat the convincingness as evidence.
Getting the right answer here is important enough, and blind spots and black-swan errors are common enough, that it can make a lot of sense to check your work even in cases where you'd be super surprised to learn you'd been wrong. Getting outside feedback can be a good way to do this.

I've noticed that when I worry "what if everything I believe is wrong?", sometimes it's a real worry that I'm biased in a specific way, or that I might just be missing something. Other times, it's more like an urge to be dutifully/performatively skeptical or to get a certain kind of emotional reassurance; see https://equilibriabook.com/toc/ for a good discussion of this.

Re

Arguably this forum kind of does this job, though A) we are all tremendously biased B) are people *really* checking the minutiae? I am not.

Some people check some minutiae. The end of https://sideways-view.com/2018/07/08/the-elephant-in-the-brain/ is a cool example that comes to mind.

I haven't had any recent massive updates about EA sources' credibility after seeing a randomized spot check. Which is one way of trying to guess at the expected utility of more marginal spot-checking, vs. putting the same resources into something else.

My main suggestion, though, would be to check out various examples of arguments between EAs, criticisms of EAs by other EAs, etc., and use that to start building a mental model of EA's epistemic hygiene and likely biases or strengths. "Everyone on the EA Forum must be tremendously biased because otherwise they surely wouldn't visit the forum" is a weak starting point by comparison; you can't figure out which groups in the real world are biased (or how much, or in what ways) from your armchair.

Holly Elmore ⏸️ 🔸May 7 201928

0

0

I think I know very well where Nathan is coming from, and I don't think it's invalid, for the reasons you state among others. But after much wrangling with the same issues, my comment is the only summary statement I've ever really been able to make on the matter. He's just left religion and I feel him on not knowing what to trust-- I don't think there's any othe place he could be right now.

I suppose what I really wanted to say is that you can never surrender those doubts to anyone else or some external system. You just have to accept that you will make mistakes, stay alert to new information, and stay in touch with what changes in you over time.

RobBensingerMay 8 20197

0

0

Yeah, strong upvote to this too.

0

0

I think "competitors" for key EA orgs, your point #2, are key here. No matter how smart and committed you are, without competitors there is less pressure on you to correct your faults and become the best version of yourself.

Competitors for key EA orgs will also be well-positioned (in some cases, perhaps in the best possible position) to dialogue with the orgs they compete with, improving them and likely also the EA "public sphere."

I don't think an independent auditor that works across EA orgs and mainly focuses on logic would be as high a value-add as competitors for specific orgs. The auditor is not going to be enough of a domain expert to competently evaluate the work of a bunch of different orgs. But I think it's worth thinking more about. Would be curious if you or anyone has more ideas about the specifics of that.

RaemonMay 7 20194

0

0

I basically agree with this. I have a bunch of thoughts about healthy competition in the EA sphere I've been struggling to write up.

Evan_GaensbauerMay 7 20197

0

0

This is kind of off-topic, but I remember a few years ago, regarding the possibility of competition within AI alignment, I asked Nate, and he said one day he'd like to set up something like competing departments within MIRI. The issue with that at the time was that having an AI alignment organization respond to the idea they should have competitors with, instead of "trust us to do a good job", to internalize competition checks out to "trust us to do a good job". Things have changed, what with MIRI being much more reticent to publish much of their research, so it's almost like "trust us to do a good job" now no matter what MIRI actually does.

Divergence of efforts in AI alignment could lead to an arms race, and that's bad. At the same time, we can't discourage competition in AI alignment. It seems, for AI alignment, determining what is 'healthy' competition is extremely complicated. I just thought I'd bring this up, since competition in AI alignment is at least somewhat necessary while also posing a risk of a race to the bottom, in a way that, for example, bednet distribution doesn't.

LinchMay 8 20194

0

0

Divergence of efforts in AI alignment could lead to an arms race

Can you be a bit concrete about what this will look like? Is this because different approaches to alignment can also lead to insight in capabilities, or is there something else more insidious?

Naively it's easy to see why an arms race in AI capabilities is bad, but competition for AI alignment seems basically good.

Evan_GaensbauerMay 8 20192

0

0

An alignment arms race is only bad if there is a concomitant capabilities development that would make a wrong alignment protocol counterproductive. Different approaches to alignment can lead to insights into capabilities, and that's something to be concerned about, but that isn't anything already captured in analyses of capabilities arms-race scenarios.

If there are 2 or more alignment agencies, but only one of their approaches can fit with advanced AI systems as developed, each would race to complete their alignment agenda before the other agencies could complete theirs. This rushing could be especially bad if anyone doesn't take the time to authenticate and verify their approach will actually align AI as intended. In addition, if the competition becomes hostile enough, AI alignment agencies won't be checking each other's work in good faith, and in general, there won't be enough trust for anyone to let anyone else check the work they've done for alignment.

If 1 or more of these agencies racing to the finish line doesn't let anyone check their work, and their strategy is invalid or unsound, then implementing one of them into an AI system would fail to lead to alignment, when it was expected that it would. In other words, because of mistakes made, what looks like an alignment competition inadvertently becomes a misalignment race.

I'm not saying competition in AI alignment is either good or bad by default. What I am saying is it appears there are particular conditions that would lead competition in AI alignment to make things worse, and that such states should be avoided. To summarize, it appears to me at least some of those conditions are:

1. Competition in AI alignment becomes a 'race.'

2. One or more agencies in AI alignment themselves become untrustworthy.

3. Even if in principle all AI alignment agencies should be able to trust each other, in practice they end up mistrusting each other.

Nathan YoungMay 9 20193

0

0

How can one incentivise the right kind of behaviour here? This isn't a zero sum game - we can all win, we can all lose. How do we inculcate the market with that knowledge such that the belief that only one of us can win doesn't make us all more likely to lose?

Off the top of my head:

Some sort of share trading scheme.

Some guarantee from different AI companies that whichever one reaches AI first will employ people from the others.

Aaron Gertler 🔸May 7 20198

0

0

It is a genuine possibility that I will change the entire course of my life as a result of the information on 80k hours. I guess many of you already have. Have you checked all of their reasoning? What percentage ought one to check?

While I don't disagree that some kind of "independent auditor" might be useful, my advice to people considering a major change (which I make as a young person with relatively little life experience!) is as follows:

As best you can, try to only change your life by an amount proportionate with your trust in the change you plan to make.

If you don't feel like you really understand 80K's reasoning about a certain issue, read as much of their relevant work as you can find. If that doesn't give you enough confidence, reach out to them directly. If they don't have enough time or coaching slots to respond personally, ask people on the Forum for their thoughts on the likely effects of a given life-change. If you still feel fairly uncertain after that, keep asking questions and looking for better evidence.

And if you still can't find enough evidence after trying everything you can find... consider not changing your life, or trying out some smaller version of the change (freelancing for a few weeks instead of taking a new job, signing up for Try Giving instead of the full Giving What We Can pledge, etc.)

It may be the case that no career path has truly ironclad evidence for effectiveness, at least not in a way that applies to every job-seeker (since no two people have the same skills/personality/alternative options). If you come to believe that, you may be forced to make a change based on whatever evidence you can find (or based on more concrete information like salary, location, and other things that impact your personal well-being). But overall, I hope that people in EA mostly make life-changing decisions if they have a lot of confidence in those decisions, whether because of personal research or because they think highly of the quality of research conducted by EA organizations.

(I work for CEA, but these views are my own -- and, as I mentioned above, they're being made by someone who hasn't undergone too many major life changes.)

Nathan YoungMay 7 20195

0

0

Thanks for responding (and for encouraging me, I think, to write this in the first place).

Only change your life by an amount proportionate with your trust in the change you plan to make.

Sure. The point I am trying to make is that I would pay to have some of that work done for me. If enough people would then you could pay someone to do it. I don't think we disagree with the thinking that needs to be done, but I think I am less inclined/less trusting that I will do it well and would prefer and infrastructural solution.

[anonymous]May 8 20196

0

0

I also agree that some infrastructure would be good. In the meantime, I suggest reading criticisms of EA from both non-EAs and from EAs, and how EAs respond to the criticism (or how one could successfully respond to it). That's probably the closest you can get to external audits and checking for flaws in EA.

Unfortunately there's no central repository of EA criticism that I know of (this seems quite valuable to me!). Carl Shulman said on Facebook recently on a post by Julia Galef that he keeps a personal bookmarks folder of criticisms of groups that he has some affiliation with or interest in. If you're interested, you could try contacting him to see if it's shareable.

You can also check the mistakes pages of EA orgs, like GiveWell and 80000 Hours (and their evaluations page). That's only a partial solution since there could be many mistakes by EA orgs that they themselves don't recognize, but it's one step forward of many.

Aaron Gertler 🔸May 9 20192

0

0

Yes, I agree that infrastructure would be better than no infrastructure. I'm not sure who I'd trust to do this job well among people who aren't already working on something impactful, but perhaps there are people in EA inclined towards a watchdog/auditor mentality who would be interested, assuming that the orgs being "audited" could work out an arrangement that everyone felt good about.

weeatquinceMay 6 20197

0

0

I very much like the idea of an independent impact auditor for EA orgs.

I would consider funding or otherwise supporting such a project, anyone working on, get in touch...

One solution that happens already is radical transparency.

GiveWell and 80,000 Hours both publicly write about their mistakes. GiveWell have in the past posted vast amounts of their background working online. This level of transparency is laudable.

Aaron Gertler 🔸May 9 20194

0

0

CEA also has a Mistakes page, though it seems to be less well-known than those other examples.

Milan GriffesMay 6 20194

0

0

Related discussion here: Should EA grantmaking be subject to independent audit?

Milan GriffesMay 6 20192

0

0

I would consider funding or otherwise supporting such a project, anyone working on, get in touch...

Have you chatted with Evan Gaensbauer about this? I believe he's interested in continuing to do such work, and he has a track record.

Milan GriffesMay 6 20193

0

0

Good post, thanks for putting in the care & attention necessary to write it up!

I recently left religion and for that reason would like to know that I am not fooling myself here also. "Trust EA organisations because they are good" doesn't hold much water since the logic applies elsewhere - "Trust the Church because it is good"?

It's a great question. I don't have a very satisfying answer except that I've been regularly impressed by the kindness, acuity, and diligence of folks in the EA community.

Also your question here reminds me of this post.

More from Nathan Young

Curated and popular this week

Relevant opportunities