All of Daniel_Dewey's Comments + Replies

I was going to "heart" this, but that seemed ambiguous. So I'm just commenting to say, I hear you.

Thanks for taking the time to write this, Ezra -- I found it useful. 

I just heard about this via a John Green video, and immediately came here to check whether it'd been discussed. Glad to see that it's been posted -- thanks for doing that! (Strong-upvoted, because this is the kind of thing I like to see on the EA forum.)

I don't have the know-how to evaluate the 100x claim, but it's huge if true -- hopefully if it pops up on the forum like this now and then, especially as more evidence comes in from the organization's work, we'll eventually get the right people looking to evaluate this as an opportunity.

I think this is a good point; you may also be interested in Michelle's post about beneficiary groups, my comment about beneficiary subgroups, and Michelle's follow-up about finding more effective causes.

Thanks Tobias.

In a hard / unexpected takeoff scenario, it's more plausible that we need to get everything more or less exactly right to ensure alignment, and that we have only one shot at it. This might favor HRAD because a less principled approach makes it comparatively unlikely that we get all the fundamentals right when we build the first advanced AI system.

FWIW, I'm not ready to cede the "more principled" ground to HRAD at this stage; to me, it seems like the distinction is more about which aspects of an AI system's behavior we're specif... (read more)

Thanks for these thoughts. (Your second link is broken, FYI.)

On empirical feedback: my current suspicion is that there are some problems where empirical feedback is pretty hard to get, but I actually think we could get more empirical feedback on how well HRAD can be used to diagnose and solve problems in AI systems. For example, it seems like many AI systems implicitly do some amount of logical-uncertainty-type reasoning (e.g. AlphaGo, which is really all about logical uncertainty over the result of expensive game-tree computations) -- maybe HRAD could be ... (read more)

0
LawrenceC
7y
My suspicion is that MIRI agrees with you - if you read their job post on their software engineering internship, it seems that they're looking for people who can rapidly prototype and test AI Alignment ideas that have implications in machine learning.
1
WillPearson
7y
Fixed, thanks. I agree that HRAD might be useful. I read some of the stuff. I think we need a mix of theory and practice and only when we have community where they can feed into each other will we actually get somewhere. When an AI safety theory paper says, "Here is an experiment we can do to disprove this theory," then I will pay more attention than I do. The "ignored physical aspect of computation" is less about a direction to follow, but more an argument about the type of systems that are likely to be effective and so an argument about which ones we should study. There is no point studying how to make ineffective systems safe if the lessons don't carry over to effective ones. You don't want a system that puts in the same computational resources trying to decide what brand of oil is best for its bearings as it does to deciding the question of what is a human or not. If you decide how much computational resources you want to put into each class of decision, you start to get into meta-decision territory. You also need to decide how much of your pool you want to put into making that meta-decision as making it will take away from making your other decisions. I am thinking about a possible system which can allocate resources among decision making systems and this can be used to align the programs (at least somewhat). It cannot align a super intelligent malign program, work needs to done on the initial population of programs in the system, so that we can make sure they do not appear. Or we need a different way of allocating resources entirely. I don't pick this path because it is an easy path to safety, but because I think it is the only path that leads anywhere interesting/dangerous and so we need to think about how to make it safe.

My guess is that the capability is extremely likely, and the main difficulties are motivation and reliability of learning (since in other learning tasks we might be satisfied with lower reliability that gets better over time, but in learning human preferences unreliable learning could result in a lot more harm).

Thanks for this suggestion, Kaj -- I think it's an interesting comparison!

I am very bullish on the Far Future EA Fund, and donate there myself. There's one other possible nonprofit that I'll publicize in the future if it gets to the stage where it can use donations (I don't want to hype this up as an uber-solution, just a nonprofit that I think could be promising).

I unfortunately don't spend a lot of time thinking about individual donation opportunities, and the things I think are most promising often get partly funded through Open Phil (e.g. CHAI and FHI), but I think diversifying the funding source for orgs like CHAI and FHI is valuable, so I'd consider them as well.

5
LawrenceC
7y
Not super relevant to Peter's question, but I would be interested in hearing why you're bullish on the Far Future EA Fund.

I think there's something to this -- thanks.

To add onto Jacob and Paul's comments, I think that while HRAD is more mature in the sense that more work has gone into solving HRAD problems and critiquing possible solutions, the gap seems much smaller to me when it comes to the justification for thinking HRAD is promising vs justification for Paul's approach being promising. In fact, I think the arguments for Paul's work being promising are more solid than those for HRAD, despite it only being Paul making those arguments -- I've had a much harder time understanding anything more nuanced than the basic case for HRAD I gave above, and a much easier time understanding why Paul thinks his approach is promising.

2
Wei Dai
7y
Daniel, while re-reading one of Paul's posts from March 2016, I just noticed the following: My interpretation of this is that between March 2016 and the end of 2016, Paul updated the difficulty of his approach upwards. (I think given the context, he means that other problems, namely robust learning and meta-execution, are harder, not that informed oversight has become easier.) I wanted to point this out to make sure you updated on his update. Clearly Paul still thinks his approach is more promising than HRAD, but perhaps not by as much as before.
1
Wei Dai
7y
This seems wrong to me. For example, in the "learning to reason from human" approaches, the goal isn't just to learn to reason from humans, but to do it in a way that maintains competitiveness with unaligned AIs. Suppose a human overseer disapproves of their AI using some set of potentially dangerous techniques, how can we then ensure that the resulting AI is still competitive? Once someone points this out, proponents of the approach, to continue thinking their approach is promising, would need to give some details about how they intend to solve this problem. Subsequently, justification for thinking the approach is promising is more subtle and harder to understand. I think conversations like this have occurred for MIRI's approach far more than Paul's, which may be a large part of why you find Paul's justifications easier to understand.

My perspective on this is a combination of “basic theory is often necessary for knowing what the right formal tools to apply to a problem are, and for evaluating whether you're making progress toward a solution” and “the applicability of Bayes, Pearl, etc. to AI suggests that AI is the kind of problem that admits of basic theory.” An example of how this relates to HRAD is that I think that Bayesian justifications are useful in ML, and that a good formal model of rationality in the face of logical uncertainty is likely to be useful in analogous ways. When

... (read more)

Thanks Nate!

The end goal is to prevent global catastrophes, but if a safety-conscious AGI team asked how we’d expect their project to fail, the two likeliest scenarios we’d point to are "your team runs into a capabilities roadblock and can't achieve AGI" or "your team runs into an alignment roadblock and can easily tell that the system is currently misaligned, but can’t figure out how to achieve alignment in any reasonable amount of time."

This is particularly helpful to know.

We worry about "unknown unknowns", but I’d pro

... (read more)
7
So8res
7y
I want to steer clear of language that might make it sound like we’re saying: * X 'We can't make broad-strokes predictions about likely ways that AGI could go wrong.' * X 'To the extent we can make such predictions, they aren't important for informing research directions.' * X 'The best way to address AGI risk is just to try to advance our understanding of AGI in a general and fairly undirected way.' The things I do want to communicate are: * All of MIRI's research decisions are heavily informed by a background view in which there are many important categories of predictable failure, e.g., 'the system is steering toward edges of the solution space', 'the function the system is optimizing correlates with the intended function at lower capability levels but comes uncorrelated at high capability levels', 'the system has incentives to obfuscate and mislead programmers to the extent it models its programmers’ beliefs and expects false programmer beliefs to result in it better-optimizing its objective function.’ * The main case for HRAD problems is that we expect them to help in a gestalt way with many different known failure modes (and, plausibly, unknown ones). E.g., 'developing a basic understanding of counterfactual reasoning improves our ability to understand the first AGI systems in a general way, and if we understand AGI better it's likelier we can build systems to address deception, edge instantiation, goal instability, and a number of other problems'. * There usually isn't a simple relationship between a particular open problem and a particular failure mode, but if we thought there were no way to predict in advance any of the ways AGI systems can go wrong, or if we thought a very different set of failures were likely instead, we'd have different research priorities.

I'm going to try to answer these questions, but there's some danger that I could be taken as speaking for MIRI or Paul or something, which is not the case :) With that caveat:

I'm glad Rob sketched out his reasoning on why (1) and (2) don't play a role in MIRI's thinking. That fits with my understanding of their views.

(1) You might think that "learning to reason from humans" doesn't accomplish (1) because a) logic and mathematics seem to be the only methods we have for stating things with extremely high certainty, and b) you probably can't rule

... (read more)

Thanks for linking to that conversation -- I hadn't read all of the comments on that post, and I'm glad I got linked back to it.

Thanks!

Conditional on MIRI's view that a hard or unexpected takeoff is likely, HRAD is more promising (though it's still unclear).

Do you mean more promising than other technical safety research (e.g. concrete problems, Paul's directions, MIRI's non-HRAD research)? If so, I'd be interested in hearing why you think hard / unexpected takeoff differentially favors HRAD.

1
Tobias_Baumann
7y
Yeah, and also (differentially) more promising than AI strategy or AI policy work. But I'm not sure how strong the effect is. In a hard / unexpected takeoff scenario, it's more plausible that we need to get everything more or less exactly right to ensure alignment, and that we have only one shot at it. This might favor HRAD because a less principled approach makes it comparatively unlikely that we get all the fundamentals right when we build the first advanced AI system. In contrast, if we think there's no such discontinuity and AI development will be gradual, then AI control may be at least somewhat more similar (but surely not entirely comparable) to how we "align" contemporary software systems. That is, it would be more plausible that we could test advanced AI systems extensively without risking catastrophic failure or that we could iteratively try a variety of safety approaches to see what works best. It would also be more likely that we'd get warning signs of potential failure modes, so that it's comparatively more viable to work on concrete problems whenever they arise, or to focus on making the solutions to such problems scalable – which, to my understanding, is a key component of Paul's approach. In this picture, successful alignment without understanding the theoretical fundamentals is more likely, which makes non-HRAD approaches more promising. My personal view is that I find a hard and unexpected takeoff unlikely, and accordingly favor other approaches than HRAD, but of course I can't justify high confidence in this given expert disagreement. Similarly, I'm not highly confident that the above distinction is actually meaningful. I'd be interested in hearing your thoughts on this!

Thanks Tara! I'd like to do more writing of this kind, and I'm thinking about how to prioritize it. It's useful to hear that you'd be excited about those topics in particular.

4
MikeJohnson
7y
I too found this post very helpful/illuminating. I hope you can continue to do this sort of writing!

Welcome! :)

I think your argument totally makes sense, and you're obviously free to use your best judgement to figure out how to do as much good as possible. However, a couple of other considerations seem important, especially for things like what a "true effective altruist" would do.

1) One factor of your impact is your ability to stick with your giving; this could give you a reason to adopt something less scary and demanding. By analogy, it might seem best for fitness to commit to intense workouts 5 days a week, strict diet changes, and no alcoho... (read more)

Thanks for putting StrongMinds on my radar!

Nice work, and looks like a good group of advisors!

Re: donation: I'd personally feel best about donating to the Long-Term Future EA Fund (not yet ready, I think?) or the EA Giving Group, both managed by Nick Beckstead.

7
TaraMacAulay
7y
The EA Funds are now live and accepting donations. You can read about the Far Future fund here.

Thanks for recommending a concrete change in behavior here!

I also appreciate the discussion of your emotional engagement / other EAs' possible emotional engagement with cause prioritization -- my EA emotional life is complicated, I'm guessing others have a different set of feelings and struggles, and this kind of post seems like a good direction for understanding and supporting one another.

ETA: personally, it feels correct when the opportunity arises to emotionally remind myself of the gravity of the ER-triage-like decisions that humans have to make when a... (read more)

I agree that if engagement with the critique doesn't follow those words, they're not helpful :) Editing my post to clarify that.

The pledge is really important to me as a part of my EA life and (I think) as a part of our community infrastructure, and I find your critiques worrying. I'm not sure what to do, but I appreciate you taking the critic's risk to help the community. Thank you!

This is a great point -- thanks, Jacob!

I think I tend to expect more from people when they are critical -- i.e. I'm fine with a compliment/agreement that someone spent 2 minutes on, but expect critics to "do their homework", and if a complimenter and a critic were equally underinformed/unthoughtful, I'd judge the critic more harshly. This seems bad!

One response is "poorly thought-through criticism can spread through networks; even if it's responded to in one place, people cache and repeat it other places where it's not responded to, and that... (read more)

0
RyanCarey
7y
Not sure how much this helps because if the criticism is thoughtful and you fail to engage with it, you're still being rude and missing an opportunity, whether or not you say some magic words.

Thanks!

I think parts of academia do this well (although other parts do it poorly, and I think it's been getting worse over time). In particular, if you present ideas at a seminar, essentially arbitrarily harsh criticism is fair game. Of course, this is different from the public internet, but it's still a group of people, many of whom do not know each other personally, where pretty strong criticism is the norm.

One guess is that ritualization in academia helps with this -- if you say something in a talk or paper, you ritually invite criticism, whereas I'... (read more)

Prediction-making in my Open Phil work does feel like progress to me, because I find making predictions and writing them down difficult and scary, indicating that I wasn't doing that mental work as seriously before :) I'm quite excited to see what comes of it.

3
Raemon
7y
Wanted to offer something stronger than an up vote in starting the prediction-making: that sounds like a great idea and want to see how it goes. :)

I have very mixed feelings about Sarah's post; the title seems inaccurate to me, and I'm not sure about how the quotes were interpreted, but it's raised some interesting and useful-seeming discussion. Two brief points:

  • I understand what causes people to write comments like "lying seems bad but maybe it's the best thing to do in some cases", but I don't think those comments usually make useful points (they typically seem pedantic at best and edgy at worst), and I hope people aren't actually guided by considerations like those. Most EAs I work wit
... (read more)

I'm really glad you posted this! I've found it helpful food for thought, and I think it's a great conversation for the community to be having.

For many Americans, income taxes might go down; probably worth thinking about what to do with that "extra" money.

Thanks for mentioning this -- I totally see what you're pointing at here, and I think you make valid points re: there always being more excuses later.

I just meant to emphasize that "giving now feels good" wasn't something I was prepared to justify in terms of its actual impact on the world; if I found out that this good feeling was justified in terms of impact, that'd be great, but if it turned out that I could give up that good feeling in order to have a better impact, I'd try my best to do so.

Thanks Milan!

I haven't thought a lot about that, and might be making the wrong call. Off the top of my head:

  • There's a community norm toward donating 10%, and I'm following that without thinking too hard.
  • I expect donation effectiveness on the scale of my donations to get worse over time, so giving earlier at the cost of giving a little (?) less over my career seems like it might be better.
  • Giving feels good in a way that paying debt doesn't. This isn't an EA reason :)

I guess I could put my 10% toward debt reduction instead -- if you or anyone else has ... (read more)

1
Milan_Griffes
8y
I don't have pointers to good info, other than Mr. Money Mustache's blog, which I think was already mentioned. I'm following in intuition along the lines of "put on your own oxygen mask before helping those around you with theirs." My bet is that my personal impact will be much larger once I'm financially independent. Giving a significant portion of my income now is a drag on reaching financial independence. I'd prefer to accelerate my progress towards financial independence at the expense of doing good today. We're touching on the "give now vs. give later" debate here; intuitions may diverge.
2
Jmd
8y
I disagree that 'giving cause it feels good' isn't an EA reason to give. It's about the head and the heart right? I give because it feels good, and it feels even better knowing that where you give is high impact and if giving makes you feel good then that's encouraging to others as well :) And I also started giving when I had my student loan to pay off - maybe if my loan was bigger I would have thought about starting with smaller donations like with The Life You Can Save, but my main motivation was that if the debt is an excuse now, then buying a house will be an excuse later, and then all the other life excuses and I will never do it. So I leapt. People live really well on less than I did even with the donations and the loan repayments, it does mean thinking more about 'fun' activities' though I found that I could still do all those things and where I spent less was on 'stuff' - things you buy but don't really need anyways.

I was glad to see this article -- I think it's a very interesting issue, and generally want to encourage people to bring up this kind of thing so that we can continue to look for more effective causes and beneficiary groups. Nice work!

I didn't find the presentation unpleasant, personally, but I have a high tolerance for being opinionated, and it's been helpful to see others' reactions in the comments.

That's great, thanks for letting me know! Score one for posting on fora :)

Since the groups above seem to exhaust the space of beneficiaries (if what we care about is well-being), we can’t expect to get more effectiveness improvements in this way. In future, such improvements will have to come from finding new interventions, or intervention types.

Though I think the conclusion may well be correct, this argument doesn't seem valid to me. Thinking about it more produced some ideas I found interesting.

Imagine that we instead had only one group of beneficiaries: all conscious beings. We could run the same argument -- this group exh... (read more)

4
Michelle_Hutchinson
8y
You ended up pretty substantially impacting the follow-up.
1
Michelle_Hutchinson
8y
Good point, thanks Daniel!

Some of the most significant insights of effective altruism in terms of finding more effective ways to help others have come from highlighting different beneficiary groups.

This makes me want to split off "people in extreme poverty" into a distinct group of beneficiaries -- I suspect that for many the "aha!" moment in their EA journey was realizing that these people exist and can be helped. Also, it seems to me that the interventions available for helping people in extreme poverty are quite different from interventions that help riche... (read more)

This is a great article, Michelle! Looking forward very much to the follow-up.

0
Michelle_Hutchinson
8y
Thank you!

Has anyone here seen any good analyses of helping Syrian refugees as a cause area, or the most effective ways to do it? I've seen some commentary on opening borders and some general tips on disaster relief from GiveWell, but not much beyond that. Thanks!

1
Daniel_Dewey
9y
Update from GiveWell here, with comments: Donating to help with the Syrian refugee crisis
3
AlasdairGives
9y
There is a blog post by one EA with some suggestions and rough calculations
-4
russoxo
9y
Hi Daniel, there is already discussion on this topic on the Facebook group. Hi everyone! This is my point about us needing a better forum... Am I alone in this belief? Cheers

Thanks! :) After our conversation Owen jumped right into the write-up, and I pitched in with the javascript -- it was fun to just charge ahead and execute a small idea like this.

It's true that this calculator doesn't take field-steering or paradigm-defining effects of early research into account, nor problems of inherent seriality vs parallelizable work. These might be interesting to incorporate into a future model, at some risk of over-complicating what will always be a pretty rough estimate.

Thanks! Going to fix. It was supposed to say "by the time we develop those..."

Follow-up: this comment suggests that Nate weakly favors strategies 2 and/or 3 over 1.

I am not Nate, but my view (and my interpretation of some median FHI view) is that we should keep options open about those strategies and as-yet unknown other strategies instead of fixating on one at the moment. There's a lot of uncertainty, and all of the strategies look really hard to achieve. In short, no strongly favored strategy.

FWIW, I also think that most current work in this area, including MIRI's, promotes the first three of those goals pretty well.

Thanks! This reply makes sense to me, and the refutation of the marginal-contribution strategy is interesting. I can see why you've chosen to group tightly complementary contributions.

Thanks for posting these updates, I'm quite excited about the project!

Have you considered incentive problems stemming from the fact that you require fractions of impact to be allocated among participants so that they add up to 1? My understanding is that this way of allocating credit doesn't produce the desired results in cases where the project wouldn't have happened without all participants (see e.g. 5 mistakes of moral reasoning).

If you've already answered this, I'd appreciate a link -- I know you've thought about this quite a bit.

2
Paul_Christiano
9y
Discussed briefly here. There are two things going on in these cases: * Impact purchases incorrectly value good deeds based on the highest bidder, instead of summing over all people. They are supposed to work correctly on the supply side, but not the demand side. This is a complicated issue I may discuss later. In order to get a correct protocol you need to combine them with another idea. For the rest of the post I am going to assume that everyone has the same values, which makes this issue go away. Note that this issue is similar for normal donations. * If there are increasing or diminishing returns to scale, the sum of people's marginal contributions don't add up to 1. The simplest case is when output = (sum of inputs)^x for some x other than 1. If there are decreasing returns to scale, then there are rents: the sum of the marginal outputs adds up to less than the total output, and so there is some extra value to be captured. Certificate purchases work fine in that case---each contributor can unilaterally claim their impact, or the group can claim its impact and decide how to split the rent. Increasing returns are more complicated. It's still fine to pay a project for its total impact---there is guaranteed to be some way of assigning that impact which would incentivize people to do the project (othertise they should have all done their second-best option, and they would have produced more value that way). Our approach is to group tightly complementary contributions together and let them negotiate a solution. This is the same thing that we normally do in the broader market. Philanthropy normally dodges the issue by just not thinking about it. Note that the naive strategy of just paying each person for their marginal contribution would also go wrong. For example, suppose that there are two people, who can each contribute up to 1 unit of effort in a project that creates E^2 value, where E is the amount of effort. Each person can also use their unit of effort

I've often found the EAs around me to be

(i) very supportive of taking on things that are ex ante good ideas, but carry significant risk of failing altogether, and

(ii) good at praising these decisions after they have turned out to fail.

It doesn't totally remove the sting to have those around you say "Great job taking that risk, it was the right decision and the EV was good!" and really mean it, but I do find that it helps, and it's a habit I'm trying to build to praise these kinds of things after the fact as much as I praise big successes.

Of cours... (read more)

Load more