Joshc

398Joined Dec 2021

Comments
18

Topic Contributions
1

That's a good point. Here's another possibility:
Require that students go through a 'research training program' before they can participate in the research program. It would have to actually help prepare them for technical research though. Relabeling AGISF as a research training program would be misleading, so you would want to add a lot more technical content (reading papers, coding assignments, etc.)  It would probably be pretty easy to gauge how much the training program participants care about X-risk / safety and factor that in when deciding whether to accept them into the research program.

The social atmosphere can also probably go a long way in influencing people's attitudes towards safety. Making AI risk an explicit focus of the club, talking about it a lot at socials, inviting AI safety researchers to dinners, etc might do most of the work tbh.

Oo exciting. Yeah, the research program looks like it is closer to what I'm pitching. 

Though I'd also be excited about putting research projects right at the start of the pipeline (if they aren't already). It looks like AGISF is still at the top of your funnel and I'm not sure if discussion groups like these will be as good for attracting talent.

Late to the party here, but I was wondering why these organizations need aligned engineering talent. Anthropic seems like the kind of org that talented, non-aligned people would be interested in...

These are reasonable concerns, thanks for voicing them. As a result of unforeseen events, we became responsible for running this iteration only a couple of weeks ago. We thought that getting the program started quickly — and potentially running it at a smaller scale as a result — would be better than running no program at all or significantly cutting it down.

The materials (lectures, readings, homework assignments) are essentially ready to go and have already been used for MLSS last summer. Course notes are supplementary and are an ongoing project.

We are putting a lot of hours into making sure this program gets started without a hitch and runs smoothly. We are sorry the deadlines are so aggressive and agree that it would have been better to launch earlier. If you have trouble getting your application in on time, please don't hesitate to contact us about getting an extension. We also plan to run another iteration in the Spring and announce the program further in advance.

Yeah, I would be in favor of interaction in simulated environments -- other's might disagree, but I don't think this influences the general argument very much as I don't think leaving some matter for computers will reduce the number of brains by more than an order of magnitude or so.

Having a superintelligence aligned to normal human values seems like a big win to me! 


Not super sure what this means but the 'normal human values' outcome as I've defined it hardly contributes to EV calculations at all compared to the utopia outcome. If you disagree with this, please look at the math and let me know if I made a mistake.

Yep, I didn't initially understand you. That's a great point!

This means the framework I presented in this post is wrong. I agree now with your statement:

the EV of partly utilitarian AI is higher than that of fully utilitarian AI.


I think the framework in this post can be modified to incorporate this and the conclusions are similar. The quantity that dominates the utility calculation is now the expected representation of utilitarianism in the AGI's values.

The two handles become:
(1) The probability of misalignment.
(2) The expected representation of utilitarianism in the moral parliament conditional on alignment.

The conclusion of the post, then, should be something like "interventions that increase (2) might be underrated" instead of "interventions that increase the probability of fully utilitarian AGI are underrated."
 

Yep, thanks for pointing that out! Fixed it.

...I haven't seen much discussion about the downsides of delaying

I'm not sure how your first point relates to what I was saying in this post; but, I'll take a guess. I said something about how investing in capabilities at anthropic could be good. An upside to this would be increasing the probability that EAs end up controlling the super-intelligent AGI in the future. The downside is that it could shorten timelines, but hopefully this can be mitigated by keeping all of the research under wraps (which is what they are doing). This is a controversial issue though. I haven't thought very much about whether the upsides outweigh the downsides, but the argument in this post caused me to believe the upsides were larger than I thought before.
 

Also I'm not sure about outcome 1 having zero utility...

It doesn't matter what outcome you assign zero value to as long as the relative values are the same since if a utility function is an affine function of another utility function then they produce equivalent decisions.

Load More