Sharmake

You do not need a strong theory for why something must be possible in order to put non-trivial credence on it being possible, and if you hold a prior that scientific difficulty of doing something is often overrated, especially if you believe in the idea that alignment is possibly automatable and that a lot of people overrate the difficulty of automating something, that's enough to cut p(doom) by a lot, arguably 1 OOM, but at the very least nowhere near your 90 p(doom)%. That doesn't mean that we are going to make it out of ASI alive, but it does mean that even in situations where there is no established theory or plan to survive, you can still possibly do something.
If I wanted to make the case that ASI alignment is possible, I'd probably read these 3 posts by Joshua Clymer first on how automated alignment schemes could work (with some discussion by Habryka and Eliezer Yudkowsky and Jeremy Gillen the comments, and Joshua Clymer's responses):

https://www.lesswrong.com/posts/8vgi3fBWPFDLBBcAx/planning-for-extreme-ai-risks

https://www.lesswrong.com/posts/TTFsKxQThrqgWeXYJ/how-might-we-safely-pass-the-buck-to-ai

https://www.lesswrong.com/posts/5gmALpCetyjkSPEDr/training-ai-to-do-alignment-research-we-don-t-already-know

So it will worry about being in a kind of panopticon? Seems pretty unlikely. Why should the AI care about being caught any more than it should about any given runtime instance of it being terminated?

The basic reason for this is that you can gain way more information on the AI once you have escaped, combined with the ability to use much more targeted countermeasures that are more effective once you have caught the AI red handed.

As a bonus, this can also eliminate threat models like sandbagging, if you have found a reproducible signal for when an AI will try to overthrow a lab.

More discussion by Ryan Greenblatt and Buck here:

https://www.lesswrong.com/posts/i2nmBfCXnadeGmhzW/catching-ais-red-handed

Discussion Thread: Existential Choices Debate Week

Sharmake12d1

The prediction of many moral perspectives caring more about averting downsides than producing upsides is well explained if we live in a moral relativist multiverse, where there are an infinity of correct moral systems, and which one you come to is path dependent and starting point dependent, but there exist instrumental goals from many moral perspectives that has a step that wants to avoid extinction/disempowerment, because it means that morality loses out in the competition/battle for survival/dominance.

cf @quinn's positive vs negative longtermism framework:

https://forum.effectivealtruism.org/posts/r5GbSZ7dcb6nbuWch/quinn-s-shortform?commentId=pvXtqvGfjATkJq7N2

Third-wave AI safety needs sociopolitical thinking

Sharmake13d5

Some thoughts on this comment:

On this part:

I responded well to Richard's call for More Co-operative AI Safety Strategies, and I like the call toward more sociopolitical thinking, since the Alignment problem really is a sociological one at heart (always has been). Things which help the community think along these lines are good imo, and I hope to share some of my own writing on this topic in the future.

I don't think it was always a large sociological problem, but yeah I've updated more towards the sociological aspect of alignment being important (especially as the technical problem has become easier than circa 2008-2016 views had).

Whether or not I agree with Richard's personal politics or not is kinda beside the point to this as a message. Richard's allowed to have his own views on things and other people are allowed to criticse this (I think David Mathers' comment is directionally where I lean too). I will say that not appreciating arguments from open-source advocates, who are very concerned about the concentration of power from powerful AI, has lead to a completely unnecessary polarisation against the AI Safety community from it. I think, while some tensions do exist, it wasn't inevitable that it'd get as bad as it is now, and in the end it was a particularly self-defeating one. Again, by doing the kind of thinking Richard is advocating for (you don't have to co-sign with his solutions, he's even calling for criticism in the post!), we can hopefully avoid these failures in the future.

I do genuinely believe that concentration of power is a huge risk factor, and in particular I'm deeply worried about the incentives of a capitalist post-AGI company where a few hold basically all of the rent/money, and given both stronger incentives to expropriate property from people, similar to how humans expropriate property from animals routinely, combined with weak to non-existent forces against expropriation of property.

That said, I think the piece on open-source AI being a defense against concentration of power and more generally a good thing akin to the enlightment unfortunately has some quite bad analogies, when giving everyone AI, depending on how powerful it is basically at the high end is enough to create entire very large economies on their own, and at the lower end help immensely/automate the process of biological weapons to common citizens is nothing like education/voting, and more importantly the impacts fundamentally require coordination to get large things done, which super-powerful AIs can remove.

More generally, I think one of the largest cruxes with reasonable open-source people and EAs in general is how much they think AIs can make biology capable for the masses, and how offense dominant is the tech, and here I defer to biorisk experts, including EAs that generally think that biorisk is a wildly offense advantaged domain that is very dangerous to democratize, compared to open source people for at least several years.

On Sam Altman's firing:

On the bounties, the one that really interests me is the OpenAI board one. I feel like I've been living in a bizarro-world with EAs/AI Safety People ever since it happened because it seemed such a collosal failure, either of legitimacy or strategy (most likely both), and it's a key example of the "un-cooperative strategy" that Richard is concerned about imo. The combination of extreme action and ~0 justification either externally or internally remains completely bemusing to me and was big wake-up call for my own perception of 'AI Safety' as a brand. I don't think people can underestimate the second-impact effect this bad on both 'AI Safety' and EA, coming about a year after FTX.

I'll be on the blunt end and say it, in that I think was mildly good or at worst neutral to use the uncooperative strategy to fire Sam Altman, because Sam Altman was going to gain all control by default and probably have better PR if the firing didn't happen, and more importantly he was aiming to disempower the safety people basically totally, which leads to at least a mild increase in existential risk, and they realized they would have been manipulated out of it if they waited, so they had to go for broke.

The main EA mistake was in acting too early, before things got notably weird.

That doesn't mean society will react or that it's likely to react, but I basically agree with Veaulans here:

https://x.com/veaulans/status/1890245459861729432

Preparing for the Intelligence Explosion

Sharmake13d1

I agree that conditional on escaping/rogue internal deployments like this scenario by Buck, with a lot of contributors, it leads to much larger disasters, and if the AI is unaligned, then unless we have an aligned AI that has somewhat similar capabilities, we lose.

My point is more so that you are way overestimating how many chances the AI has to overthrow us before it is aligned.

https://www.lesswrong.com/posts/ceBpLHJDdCt3xfEok/ai-catastrophes-and-rogue-deployments

But the crux might be that I don't think that we need that much reliability for AI catching, because as long as the AI is caught with non-negligible probability, the AI has to be very cautious, because it is way worse for the AI to be caught than to be successful or the game just ending.

In essence, we probably only need 1, or at most a few times to completely change the game in ways that massively favor alignment success.

You do not need to reliably catch someone if you can make the costs of catching the AI out very painful, and I do suspect we will probably be able to do this (perhaps using trusted AI labor, to use Ryan Greenblatt and Buck's terms).

Preparing for the Intelligence Explosion

Sharmake14d10

While finm made a general comment in response to you, I want to specifically focus on the footnote, because I think it's a central crux in why a lot of EAs are way less doomy than you.

Quote below:

We need at least 13 9s of safety for ASI, and the best current alignment techniques aren't even getting 3 9s...

I think the 13 9s can be reduced to something requiring closer to 1-2 9s at the very least, and there are 2 reasons for this:

I think you drastically overestimate how many chances the AI gets at misalignment, because the trillions of executions will use far, far too little compute per single action to lead to a takeover, even in worlds where a software intelligence explosion is possible and large efficiencies in software are gainable. I expect there to be at most millions of such plausible actions, and the more likely scenario is that it's closer to hundreds or thousands of chances to escape, undermining this tweet:

https://x.com/gcolbourn/status/1762224406955216897

2. If we manage to catch an AI doing bad stuff, then it's much, much harder for the AI to escape, because there are a lot of techniques that can be applied to make the issue go away like proliferating the escape example.

More from Ryan Greenblatt here:

https://www.lesswrong.com/posts/i2nmBfCXnadeGmhzW/catching-ais-red-handed

I definitely think alignment needs to be reliable, but I do not think it needs to be so reliable that we cannot achieve it, or that doom is very likely and we can't change the probabilities.

I'd certainly say it's quite scary, but I do think there's a reasonable hope of surviving and going on to thrive such that I think alignment invest is worth the money.

Third-wave AI safety needs sociopolitical thinking

Sharmake14d1

I basically agree with this, with one particular caveat, in that the EA and LW communities might eventually need to fight/block open source efforts due to issues like bioweapons, and it's very plausible that the open-source community refuses to stop open-sourcing models even if there is clear evidence that they can immensely help/automate biorisk, so while I think the fight was done too early, I think the fighty/uncooperative parts of making AI safe might eventually matter more than is recognized today.

Third-wave AI safety needs sociopolitical thinking

Sharmake14d3

To respond to a local point here:

Also, I am suspicious of framing "opposition to geoengineering" as bad -- this, to me, is a red flag that someone has not done their homework on uncertainties in the responses of the climate system to large-scale interventions like albedo modification. Geoengineering the planet wrong is absolutely an X-risk.

While I can definitely buy that geoengineering is a net-negative, I'm not sure how geoengineering gone wrong can actually result in X-risk, at least to me so far, and I don't currently understand the issues that well.

It doesn't speak well that he frames opposition to geoengineering as automatically bad (even if I assume the current arguments against geoengineering are quite bad).

Discussion Thread: Existential Choices Debate Week

Sharmake14d1

This is roughly my take, with the caveat that I'd replace CEV by instruction following, and I wouldn't be so sure that alignment is easy (though I do think we can replace it with the assumption that it is highly incentivized to solve the AI alignment problem and that the problem is actually solvable).

Third-wave AI safety needs sociopolitical thinking

Sharmake15d1

Crossposting this comment from LW, because I think there is some value here:

https://www.lesswrong.com/posts/6YxdpGjfHyrZb7F2G/third-wave-ai-safety-needs-sociopolitical-thinking#HBaqJymPxWLsuedpF

The main points are that value alignment will be way more necessary for ordinary people to survive, no matter the institiutions adopted, that the world hasn't yet weighed in that much on AI safety and plausibly never will, but we do need to prepare for a future in which AI safety may become mainstream, that Bayesianism is fine actually, and many more points in the full comment.

Discussion Thread: Existential Choices Debate Week

Sharmake21d1

21% disagree

The big reason I lean towards disagreeing nowadays is coming to the belief that I expect the AI control/alignment problem to be much less neglected and important to solve, and more generally I've come to doubt the assumption that worlds in which we survive are worlds in which we achieve very large value (under my own value set), such that reducing existential risk is automatically good.

Sharmake

Posts 13

Comments322

Topic contributions2

Posts
13

Comments
322

Topic contributions
2