Donald Hobson

Posts

Sorted by New

Wiki Contributions

Comments

Quotes about the long reflection

The rot13 is to make it harder to search for. I think that this is a discussion that would be easy to misinterpret as saying something offensive.

Quotes about the long reflection
but just thought that slavery was a pre-condition for some people having good things in life. Therefore, it was justified on those grounds.

Rot13

Gung vf pyrneyl n centzngvp qrpvfvba onfrq ba gur fbpvrgl ur jnf va. Svefgyl, gur fynirel nf cenpgvfrq va napvrag Terrpr jnf bsgra zhpu yrff pehry guna pbybavny fynirel. Tvira gung nyybjvat gur fynir gb znxr gurve bja jnl va gur jbeyq, rneavat zbarl ubjrire gurl fnj svg, naq gura chggvat n cevpr ba gur fynirf serrqbz jnf pbzzba cenpgvpr, gung znxrf fbzr cenpgvprf gung jrer pnyyrq fynirel bs gur gvzr ybbx abg gung qvssrerag sebz qrog.


Frpbaqyl, ur whfgvslf vg ol bgure crbcyr univat avpr guvatf, vs fubja gur cbjre bs zbqrea cebqhpgvba yvarf sbe znxvat avpr guvatf jvgubhg fynirel, ur jbhyq unir cebonoyl nterrq gung gung jnf n orggre fbyhgvba. Rira zber fb vs fubja fbzr NV anabgrpu gung pbhyq zntvp hc nal avpr guvat.

Guveqyl, zbfg fpvragvfgf hagvy gur ynfg srj uhaqerq lrnef jrer snveyl ryvgr fbpvnyyl. Zrzoref bs fbzr hccre pynff jub pbhyq nssbeq gb rkcrevzrag engure guna jbex. Tvira gur ynetr orarsvg gurl unq, guvf ybbxf yvxr n zhpu ynetre fbhepr bs hgvyvgl guna gur qverpg avprarff bs univat avpr guvatf.
V qba'g guvax gur qrpvfvba ur znqr jnf haernfbanoyr, tvira gur fbpvny pbagrkg naq vasbezngvba ninvynoyr gb uvz ng gur gvzr.

But exactly how complex and fragile?

Machine learning works fine on non adversarial inputs. If you train a network to distinguish cats from dogs, and put in a normal picture of a cat, it works. However, there are all sorts of wierd inputs that look nothing like cats or dogs that will also get classified as cats. If you give the network a bunch of bad situations, and a bunch of good, (say you crack open a history textbook, and ask a bunch of people how nice various periods and regimes were.) then you will get a network that can distinguish bad from good within the normal flow of human history. This doesn't stop there being some wierd state that counts as extremely good. Deciding what is and isn't a good future depends on the answers to moral questions that haven't come up yet, and so we don't have any training data for questions involving tech we don't yet have. This can make a big difference. If we decided that uploaded minds do count morally, we are probably going for an entirely virtual civilization, one an anti uploader would consider worthless. If we decide that mind uploads don't count morally, we might simulate loads in horrible situations for violent video games. Someone who did think that uploaded minds mattered would consider that an S risk, potentially worse than nothing.

Human level goals are moderately complicated in terms of human level concepts. In the outcome pump, "get my mother out of the building" is a human level concept. I agree that you could probably get useful and safeish behavior from such a device given a few philosopher years. Much of the problem is that concepts like "mother" and "building" are really difficult to specify in terms of quantum operators on quark positions or whatever. The more you break human concepts down, the more edge cases you find. Getting a system that would explode the building is most of the job.

The examples of obviously stupid utility functions having obviously bad results are toy problems, when we have a better understanding of symbol grounding, we will know how much the problems keep reappearing. Manually specifying a utility function Might be feasible.

I'm Buck Shlegeris, I do research and outreach at MIRI, AMA

If no more AI safety work is necessary, that means that there is nothing we can do to significantly increase the chance of FAI over UFAI.

I could be almost certain that FAI would win because I had already built one. Although I suspect that there will be double checking to do, the new FAI will need told about what friendly behavior is, someone should keep an eye out for any UFAI ect. So FAI work will be needed until the point where no human labor is needed and we are all living in a utopia.

I could be almost certain that UFAI will win. I could see lots of people working on really scary systems and still not have the slightest idea of how to do make anything friendly. But there would still be a chance that those systems didn't scale to superintelligence, that the people running them could be persuaded to turn them off, and that someone might come up with a brilliant alignment scheme tomorrow. Circumstances where you can see that you are utterly screwed, yet still be alive, seem unlikely. Keep working untill the nanites turn you into paperclips.

Alternatively, it might be clear that we aren't getting any AI any time soon. The most likely cause of this would be a pretty serious disaster. It would have to destroy most of humanities technical ability and stop us rebuilding it. If AI alignment is something that we will need to do in a few hundred years, once we rebuild society enough to make silicon chips, its still probably worth having someone making sure that progress isn't forgotten, and that the problem will be solved in time.

We gain some philosophical insight that says that AI is inherently good, always evil, impossible ect. It's hard to imagine what a philosophical insight that you don't have is like.

Existential Risk and Economic Growth

I think that existential risk is still something that most governments aren't taking seriously. If major world governments had a model that contained a substantial probability of doom, there would be a Lot more funding. Look at the sort of funding anything and everything that might possibly help that happened in the cold war. I see this not taking it seriously as being caused by a mix of human psychology, and historical coincidence. I would not expect it to apply to all civilizations.