Donald Hobson


Sorted by New

Topic Contributions


Ultra-Near-Termism: Literally An Idea Whose Time Has Come

Ultra near termism doesn't mean stopping AI. The ultra near termist would really really like a FAI FOOM that simulates a billion subjective years of utopia in the next few days. 

Of course, they would choose a 0.1% chance of this FAI FOOM now, as opposed to a 100% chance of FAI FOOM in 2 months. 

If a FAI with nanotech could make simulations that are X times faster, and Y times nicer than the status quo, you should encourage AI so long as you think it has at least 1 in XY chance of being friendly.

One thing that potentially breaks ultra-near-termism is the possibility of timetravel. If you have a consistent discount rate over all time, this implies a goal of taking a block of edumonium back in time to the big bang. With pretty much any other goal having negligible utility in comparison. If you consider the past to have no value, then the goal would be to send a block of edumonium back in time to now.

The Unweaving of a Beautiful Thing

The time of a single witch or wizard trying to snare death in their dying spell was over a decade ago. Such techniques could only be used by a skilled witch or wizard upon their deaths, and such events were hard to plan for ethically. 

Now a team of thaumotological engineers laboured over a contrivance of electronics and crystals. This device would be slid under a bed of a terminally ill patient. Slotted into the centre were a handful of cells in a dish, taken from the dying patient themselves.Theoretical research had been done, software had been written hundreds of different strategies and counter strategies had been tested in simulations. Death was far faster than any human, but the right engineering could be faster than death. 

If you had walked through the corridor of the Royal Infirmary, you would have seen very little out of place compared to a normal hospital. One dying patient had slightly more sensors on their body than usual, the wires running under the bed. Small puffs of cool cloud flowed out one side of the bed as nitrogen gas was vented from the cooling system. 

Death approached the bed. They felt a brief disorienting pulse of magic, just enough to throw them off balance for a few seconds. Shaking his head at the crudeness of the supposed trap, death claimed another soul. 

Half an hour later, a man arrived at the hospital. He slid the box out from under the bed. Opening the top, he observed the contents. Most of the box was full of slag, solidified but still hot. In one insulated corner is an SSD chip full of encrypted data. The password, 100KB of quantum randomness unique to this particular box and stored offsite. He plugs the SSD chip into  his laptop and uploads the data back to the main control room. They have significantly improved their sensors from last time and have several gigs of new scans and telemetry. 

Measuring the "apocalyptic residual"

Another key factor is levels of competence to achieve their objective. The people trying to wake chuthulu by chanting magic words aren't a problem. The doomsday cult doing cutting edge biotech or AI research is a problem. How many rationality points do these doomsday cultists have?

Nines of safety: Terence Tao’s proposed unit of measurement of risk

Nines of unsafety, for the pessimists. So 2 9's of unsafety is a 99% chance of doom.

What “defense layers” should governments, AI labs, and businesses use to prevent catastrophic AI failures?

The boring answers

Don't give your AI system excess compute. Like ideally on a hardware level. Run it on a small isolated machine not a 0.1% timeshare on a supercomputer.

Use the coding practices developed by Nasa to minimize standard bugs. 

Record all random seeds and input data to make everything reproducible. 

Put in hard coded sanity checks between AI and output. A robot arm isn't allowed to move beyond safe limits by a simple max(AI_OUTPUT, MAXIMUM_ARM_ANGLE) type code.

Humans checking in the loop.

Hardware minimization of unneeded action space. Suppose you are programming a social media bot. The AI is quite smart. If you have 1 computer running the AI that has a network connection, and it rowhammers, it can send arbitrary internet packages. Computer A runs the AI and sends raw ascii down a wire to computer B. Computer B checks every simple check that you would expect to be true in normal operation, and performs a hard shutdown if it spots anything odd. (Ie funny characters, post too long or short. ) Computer A of course has no other connections, has a power supply resistant to modulated energy demand attacks, has a radio and audio noise generator next to it and is in a Farady cage. 

2 or 3 redundant computers (cosmic ray errors)

Thermite on a timer. Slag the whole computer system. Stops the AI thinking too long. Makes it harder to hide info for future versions of itself.

The speculative answers.

Holomorphic encryption. 

Run it on a quantum computer using amplitude amplification techniques. Any breaking out of the box technique must be reversible to work, or use new physics.

Honeypots that look like good hacking targets. 

A fairly easy way for any moderately smart AI to wirehead, setting its own utility to maximum. (and triggering some alarm/ delayed shutdown.)

The silly answers

Chuck the AI system into a black hole, set to only start computing after they cross the event horizon.

Quotes about the long reflection

The rot13 is to make it harder to search for. I think that this is a discussion that would be easy to misinterpret as saying something offensive.

Quotes about the long reflection
but just thought that slavery was a pre-condition for some people having good things in life. Therefore, it was justified on those grounds.


Gung vf pyrneyl n centzngvp qrpvfvba onfrq ba gur fbpvrgl ur jnf va. Svefgyl, gur fynirel nf cenpgvfrq va napvrag Terrpr jnf bsgra zhpu yrff pehry guna pbybavny fynirel. Tvira gung nyybjvat gur fynir gb znxr gurve bja jnl va gur jbeyq, rneavat zbarl ubjrire gurl fnj svg, naq gura chggvat n cevpr ba gur fynirf serrqbz jnf pbzzba cenpgvpr, gung znxrf fbzr cenpgvprf gung jrer pnyyrq fynirel bs gur gvzr ybbx abg gung qvssrerag sebz qrog.

Frpbaqyl, ur whfgvslf vg ol bgure crbcyr univat avpr guvatf, vs fubja gur cbjre bs zbqrea cebqhpgvba yvarf sbe znxvat avpr guvatf jvgubhg fynirel, ur jbhyq unir cebonoyl nterrq gung gung jnf n orggre fbyhgvba. Rira zber fb vs fubja fbzr NV anabgrpu gung pbhyq zntvp hc nal avpr guvat.

Guveqyl, zbfg fpvragvfgf hagvy gur ynfg srj uhaqerq lrnef jrer snveyl ryvgr fbpvnyyl. Zrzoref bs fbzr hccre pynff jub pbhyq nssbeq gb rkcrevzrag engure guna jbex. Tvira gur ynetr orarsvg gurl unq, guvf ybbxf yvxr n zhpu ynetre fbhepr bs hgvyvgl guna gur qverpg avprarff bs univat avpr guvatf.
V qba'g guvax gur qrpvfvba ur znqr jnf haernfbanoyr, tvira gur fbpvny pbagrkg naq vasbezngvba ninvynoyr gb uvz ng gur gvzr.

But exactly how complex and fragile?

Machine learning works fine on non adversarial inputs. If you train a network to distinguish cats from dogs, and put in a normal picture of a cat, it works. However, there are all sorts of wierd inputs that look nothing like cats or dogs that will also get classified as cats. If you give the network a bunch of bad situations, and a bunch of good, (say you crack open a history textbook, and ask a bunch of people how nice various periods and regimes were.) then you will get a network that can distinguish bad from good within the normal flow of human history. This doesn't stop there being some wierd state that counts as extremely good. Deciding what is and isn't a good future depends on the answers to moral questions that haven't come up yet, and so we don't have any training data for questions involving tech we don't yet have. This can make a big difference. If we decided that uploaded minds do count morally, we are probably going for an entirely virtual civilization, one an anti uploader would consider worthless. If we decide that mind uploads don't count morally, we might simulate loads in horrible situations for violent video games. Someone who did think that uploaded minds mattered would consider that an S risk, potentially worse than nothing.

Human level goals are moderately complicated in terms of human level concepts. In the outcome pump, "get my mother out of the building" is a human level concept. I agree that you could probably get useful and safeish behavior from such a device given a few philosopher years. Much of the problem is that concepts like "mother" and "building" are really difficult to specify in terms of quantum operators on quark positions or whatever. The more you break human concepts down, the more edge cases you find. Getting a system that would explode the building is most of the job.

The examples of obviously stupid utility functions having obviously bad results are toy problems, when we have a better understanding of symbol grounding, we will know how much the problems keep reappearing. Manually specifying a utility function Might be feasible.

I'm Buck Shlegeris, I do research and outreach at MIRI, AMA

If no more AI safety work is necessary, that means that there is nothing we can do to significantly increase the chance of FAI over UFAI.

I could be almost certain that FAI would win because I had already built one. Although I suspect that there will be double checking to do, the new FAI will need told about what friendly behavior is, someone should keep an eye out for any UFAI ect. So FAI work will be needed until the point where no human labor is needed and we are all living in a utopia.

I could be almost certain that UFAI will win. I could see lots of people working on really scary systems and still not have the slightest idea of how to do make anything friendly. But there would still be a chance that those systems didn't scale to superintelligence, that the people running them could be persuaded to turn them off, and that someone might come up with a brilliant alignment scheme tomorrow. Circumstances where you can see that you are utterly screwed, yet still be alive, seem unlikely. Keep working untill the nanites turn you into paperclips.

Alternatively, it might be clear that we aren't getting any AI any time soon. The most likely cause of this would be a pretty serious disaster. It would have to destroy most of humanities technical ability and stop us rebuilding it. If AI alignment is something that we will need to do in a few hundred years, once we rebuild society enough to make silicon chips, its still probably worth having someone making sure that progress isn't forgotten, and that the problem will be solved in time.

We gain some philosophical insight that says that AI is inherently good, always evil, impossible ect. It's hard to imagine what a philosophical insight that you don't have is like.

Existential Risk and Economic Growth

I think that existential risk is still something that most governments aren't taking seriously. If major world governments had a model that contained a substantial probability of doom, there would be a Lot more funding. Look at the sort of funding anything and everything that might possibly help that happened in the cold war. I see this not taking it seriously as being caused by a mix of human psychology, and historical coincidence. I would not expect it to apply to all civilizations.

Load More