Daniel_Dewey

I just heard about this via a John Green video, and immediately came here to check whether it'd been discussed. Glad to see that it's been posted -- thanks for doing that! (Strong-upvoted, because this is the kind of thing I like to see on the EA forum.)

I don't have the know-how to evaluate the 100x claim, but it's huge if true -- hopefully if it pops up on the forum like this now and then, especially as more evidence comes in from the organization's work, we'll eventually get the right people looking to evaluate this as an opportunity.

How to get a new cause into EA

Daniel_Dewey7y3

I think this is a good point; you may also be interested in Michelle's post about beneficiary groups, my comment about beneficiary subgroups, and Michelle's follow-up about finding more effective causes.

My current thoughts on MIRI's "highly reliable agent design" work

Daniel_Dewey8y1

Thanks Tobias.

In a hard / unexpected takeoff scenario, it's more plausible that we need to get everything more or less exactly right to ensure alignment, and that we have only one shot at it. This might favor HRAD because a less principled approach makes it comparatively unlikely that we get all the fundamentals right when we build the first advanced AI system.

FWIW, I'm not ready to cede the "more principled" ground to HRAD at this stage; to me, it seems like the distinction is more about which aspects of an AI system's behavior we're specifying manually, and which aspects we're setting it up to learn. As far as trying to get everything right the first time, I currently favor a corrigibility kind of approach, as I described in 3c above -- I'm worried that trying to solve everything formally ahead of time will actually expose us to more risk.

My current thoughts on MIRI's "highly reliable agent design" work

Daniel_Dewey8y4

Thanks for these thoughts. (Your second link is broken, FYI.)

On empirical feedback: my current suspicion is that there are some problems where empirical feedback is pretty hard to get, but I actually think we could get more empirical feedback on how well HRAD can be used to diagnose and solve problems in AI systems. For example, it seems like many AI systems implicitly do some amount of logical-uncertainty-type reasoning (e.g. AlphaGo, which is really all about logical uncertainty over the result of expensive game-tree computations) -- maybe HRAD could be used to understand how those systems could fail?

I'm less convinced that the "ignored physical aspect of computation" is a very promising direction to follow, but I may not fully understand the position you're arguing for.

My current thoughts on MIRI's "highly reliable agent design" work

Daniel_Dewey8y0

My guess is that the capability is extremely likely, and the main difficulties are motivation and reliability of learning (since in other learning tasks we might be satisfied with lower reliability that gets better over time, but in learning human preferences unreliable learning could result in a lot more harm).

My current thoughts on MIRI's "highly reliable agent design" work

Daniel_Dewey8y3

Thanks for this suggestion, Kaj -- I think it's an interesting comparison!

My current thoughts on MIRI's "highly reliable agent design" work

Daniel_Dewey8y5

I am very bullish on the Far Future EA Fund, and donate there myself. There's one other possible nonprofit that I'll publicize in the future if it gets to the stage where it can use donations (I don't want to hype this up as an uber-solution, just a nonprofit that I think could be promising).

I unfortunately don't spend a lot of time thinking about individual donation opportunities, and the things I think are most promising often get partly funded through Open Phil (e.g. CHAI and FHI), but I think diversifying the funding source for orgs like CHAI and FHI is valuable, so I'd consider them as well.

My current thoughts on MIRI's "highly reliable agent design" work

Daniel_Dewey8y3

I think there's something to this -- thanks.

To add onto Jacob and Paul's comments, I think that while HRAD is more mature in the sense that more work has gone into solving HRAD problems and critiquing possible solutions, the gap seems much smaller to me when it comes to the justification for thinking HRAD is promising vs justification for Paul's approach being promising. In fact, I think the arguments for Paul's work being promising are more solid than those for HRAD, despite it only being Paul making those arguments -- I've had a much harder time understanding anything more nuanced than the basic case for HRAD I gave above, and a much easier time understanding why Paul thinks his approach is promising.

Daniel_Dewey

Posts 7

Comments59

Posts
7

Comments
59