While I am not an advocate of AI safety/alignment research because I don't think it is possible to align humans with future AGI, I have historically not done much of anything to "combat" it because I felt it was not doing any harm and maybe I would one day be proven wrong about my impossibility hypothesis.

Unfortunately, recent events are making it look like the situation is changing as AI safety/alignment researchers are now starting to take action rather than just spending their time trying to solve a maybe-impossible problem. I am of course referring to recent events like https://futureoflife.org/open-letter/pause-giant-ai-experiments/ and people lobbying governments for more regulatory constraints.

I have two problems with this new active role that safety researchers are taking:


People are attempting to stop others from doing the one thing that has historically brought the greatest reduction in suffering because there may be a way to prevent a catastrophe that might happen.

Technological progress has historically been the greatest reducer of human suffering. AI research is just the latest form of technological progress and one doesn't even need to be very imaginative to see how AGI could do wonders for addressing human suffering.

This needs to be weighed against the chance that we can actually align AI long term and the chance that an unaligned AGI will cause a catastrophe. It feels like much of the AI safety/alignment community are ignoring the harm caused by slowing/halting technological progress.

If I was convinced that there is a reasonable path toward AI alignment/safety, then I would likely be much more friendly to these proposed delays/regulations. At the moment AI safety/alignment researchers have not succeeded in convincing me that a path exists. Even worse, AI safety/alignment researchers have not convinced me that they even have a concrete target that is worth working toward.

The AI alignment stuff in particular seems to built on incredibly unstable ground. It seems to presuppose that there is some universal set of human values that an AI can be aligned with. However, humans are diverse and disagree on everything imaginable so at best you could align an AI with one specific set of human values and just tell the people who disagree with that set "tough luck, you should have become AI researchers and won the race to AGI". If this is the target for alignment, then I want no part in it. As much as I think my personal world view is right, I even more strongly believe that others should be allowed to have a different world view and I do not seek to force my view onto others, even if I have the opportunity to do so by inventing an AGI.

As far as the path to alignment goes, the critical problem that I see no way to solve is the one of "AI value drift". It seems quite unreasonable to believe that there is anything we can do today that would prevent AIs from changing their values in the future. At best, we could perhaps delay the time between first AGI and eventual value drift away from human values, but that doesn't really solve anything, it just kicks the can down the road. If we end up in a hard takeoff scenario (the most catastrophic), that delay may be hours, days, or weeks between hard takeoff and catastrophe, which are inconsequential amounts of time.


The proposed solutions to the immediate perceived problem won't actually slow progress, it will just change where the progress occurs and who is in control of it.

Things like halting AI research or introducing IRB-like institutions for deciding what AI research is allowed and what isn't could result in worse outcomes than if research was allowed to continue freely.

Short of putting everyone in prison and having some benevolent anti-AI dictator, you cannot actually stop AI research from happening. If you use governments to suppress research then at best you end up with the people doing research being limited in their capabilities to what can fit in a wealthy person's basement. This may slow things down, but it also may prevent some viable safety/alignment options like the invention of an exocortex.

Governments notoriously don't follow their own rules. Almost all of the solutions to slowing AI research involve getting governments to step in and forcibly stop people from doing AI research. However, what are the chances that the US, Russian, and Chinese governments will obey their own rules even if they agree to write them into law? Rules for thee, not for me. The most wealthy, and historically the most harmful, groups of people on the planet have been governments, so using regulation to stop AI research just means that the wealthiest and most harm causing organizations will be doing all of the research/advancement, and they'll do it in secret rather than in the light. This seems completely contrary to the stated goal of increasing the chances of eventual safety/alignment.

The above two points assume that regulations can actually work. While regulations do tend to be good at slowing down progress, they rarely end up the way the original proposers intended. It is likely that the final regulation will look nothing like what was intended by those who originally proposed it and instead will just serve to entrench certain actors in the ecosystem and prevent many individuals (likely the ones who care the most about alignment) from contributing.


I'm happy that people are trying to solve the problem of safety/alignment, and while I don't think it is possible, I still support/encourage the effort. Where I draw the line is when those people start negatively impacting the world's ability to reduce suffering and/or limit what knowledge is allowed to be pursued and what isn't.

9

0
0

Reactions

0
0

More posts like this

Comments14


Sorted by Click to highlight new comments since:

Thanks for sharing. In my view, technological progress is more of a mixed bag than universally good -- it's much easier for one person to make a bunch of people suffer than it was hundreds of years ago. Moreover, in many domains, technological progress creates winners and losers even though the net effect may be positive.

Here, for instance, a democratic society that creates advanced AI (not even AGI level) needs to first establish an economic system that will still achieve its goals when the main asset of humans who don't own AI companies (their labor) drops precipitously in value. Delay provides more time to recognize the need for, and implement, the needed political, social, and economic changes.

I think "time to prepare society for what is coming" is a much more sound argument than "try to stop AI catastrophe".

I'm still not a fan of the deceleration strategy, because I believe that in any potential future where AGI doesn't kill us it will bring about a great reduction in human suffering. However, I can definitely appreciate that this is very far from a given and it is not at all unreasonable to believe that the benefits provided by AGI may be significantly or fully offset by the negative impact of removing the need for humans to do stuff!

Micah - thanks for this. 

I agree that 'AI alignment' is probably impossible, for the reasons you described, plus many others.  

I also agree that formal government regulation will not be sufficient to slow or stop AI research. But I disagree that that therefore it is impossible to stop AI research. Humans have many other ways to stigmatize, cancel, demonize, and ostracize behaviors that they perceive as risky and evil. If we have to develop moral disgust against AI research and the AI industry to slow it down, that is one possible strategy, and it wouldn't require any regulation. Just normal human moral psychology and social dynamics.

I also disagree that 'AI doomers' and people who want to slow AI research are discounting or ignoring the potential benefits of AI. Everybody I know who's written about this issue accepts that AI has huge potential upsides, that technology has generally been a net positive, that tech progress has reduced human suffering, etc. That's precisely why this AI situation is so heartbreaking and difficult. We're balancing vast potential upsides against catastrophic downsides.

My key point -- which I've made probably too often by now -- is that whatever benefits AI might bring in the future will still be available in a century, or a millennium, as long as humanity survives. That tree full of golden apples will still be there for the plucking -- whenever we figure out how to avoid poisoning ourselves. We're never permanently forgoing those potential AI benefits. We're just taking the time to make sure we don't get wiped out in the meanwhile. (Of course, Nick Bostrom made basically this same point in his classic 2003 paper.)  

The main downside is that current generations might not get some of the benefits of early AI development. We might not get longevity and regenerative medicine until it's too late for some of us. We might die as a result. But our ancestors were often willing to sacrifice themselves to ensure the safety of their kids and grand-kids. We should be willing to do the same -- and if we're not, I think that makes us reckless, greedy, and selfish parents. 

whatever benefits AI might bring in the future will still be available in a century, or a millennium, as long as humanity survives. That tree full of golden apples will still be there for the plucking

In the Foundation series, I believe Isaac Asimov expressed the counterargument to this quite well: ||It is fine to take the conservative route if we are alone in the Universe. If we are not alone in the universe, then we are in an existential race and just haven't met the other racers yet.||


I agree that 'AI alignment' is probably impossible, for the reasons you described, plus many others.

The main downside is that current generations might not get some of the benefits of early AI development.

How do you reconcile these two points? If the chance of alignment is epsilon, and deceleration results in significant unnecessary deaths/suffering in the very near future, it feels like you would essentially have to have zero discount on future utility to decide to choose deceleration?


Humans have many other ways to stigmatize, cancel, demonize, and ostracize behaviors that they perceive as risky and evil.

I think this is a good/valid point. However, I weakly believe that this sort of cultural stigmatization takes a very long time to build up to the levels necessary for meaningfully slowing AI research and I don't think we have the time to do that. I suspect a weak stigma (one that isn't shared by society as a whole) is more likely to just lead to conflict and bloodshed than to actually stopping advancement in the way we would need it to.

Micah - 

It's true that if we're not alone in the universe, slower AI development might put us at marginally higher risk from aliens. However, if the aliens haven't shown up in the last 540 million years since the Cambrian explosion, they're not likely to show up in the next few centuries. On the other hand, if they're quietly watching and waiting, the development of advanced AI might trigger them to intervene suddenly, since we'll have become a more formidable threat. Maybe best to keep a low profile for the moment, in terms of AI development, until we can work more on both AI safety and astrobiology.

Re. reconciling those two points, I do have pretty close to a zero discount on future utility. AI alignment might be impossible, or it might just be really, really hard. Harder than reaping some near-term benefits of AI (e.g. help with longevity research), but those benefits could come at serious long-term risk. The longer we think about alignment, the more likely we are to either get closer to alignment, or to deciding that it's really not possible, and we should permanently ban AI using whatever strategies are most effective.

Re. stigmatizing AI, my sense is that it can be much faster for people to develop moral outrage about an emerging issue, than it is to pass effective regulation about that issue. And, passing effective regulation often requires moral outrage as a prerequisite. For example, within a few years of Bitcoin's invention, traditional finance, mainstream media, and governments based on fiat currency had successfully coordinated to demonize crypto -- and it remains a marginal part of the economy. Or, within a few weeks of the Covid-19 pandemic, people developed moral outrage against anyone walking around in public unmasked. Or, within a few weeks of He Jiankui using CRISPR to genetically modify twin babies in 2019, there was a massive global backlash against germ-line genetic engineering of humans. In the social media era, moral outrage travels faster than almost anything else.

I agree that generating outrage can happen pretty quickly. My claim here is that the level of universality required to meaningfully hinder AI development needs to be far higher than any of the examples you have given or any I can think of. You need a stigma as strong as something like incest or child molestation. One that is near universally held and very strongly enforced at the social layer, to the point that it is difficult to find any other humans who will even talk to you about the subject.

With crypto, COVID-19, and CRISPR there are still very large communities of people who are in opposition to the outraged individuals and who continue to make significant progress/gains against the outraged groups.

Micah - well, it's an interesting empirical question how much stigma would be required to slow down large-scale AI development. 

In terms of 'ethical investment', investors might easily be scared away from investing in tech that is stigmatized, given that it faces radically increased regulatory risk, adverse PR, and might be penalized under ESG standards.

In terms of talent recruitment & retention, stigma could be very powerful in dissuading smart, capable young people from joining an industry that would make them unpopular as friends, unattractive as mates, and embarrassments to their parents and family. 

Without money and people, the AI industry would starve and slow down.

Of course, terrorist cells and radical activists might still try to develop and deploy AI, but they're not likely to make much progress without large-scale institutional support.

I think your reasoning here is sound, but we have what I believe is a strong existence proof that when there is money to be made weak stigma doesn't do much:

Porn.

I think the porn industry fits nicely into your description of a weakly stigmatized industry, yet it is a booming industry that has many smart/talented people working in it even though it is weakly stigmatized.

If we are all correct, AI will be bigger (in terms of money) than the porn industry (which is huge) and I suspect demand will be higher than for porn. People may use VPNs and private browsers when using AIs, but it won't stop them I don't think.

Micah - that's a fascinating comparison actually. I'll have to think about it further.

My first reaction is, well, porn's a huge industry overall. But it's incredibly decentralized among a lot of very small-scale producers (down to the level of individual OnlyFans producers). The capital and talent required to make porn videos seems relatively modest: a couple of performers, a crew of 2-3 people with some basic A/V training, a few thousand dollars of equipment (camera, sound, lights), a rental property for a day, and some basic video editing services. By contrast, the capital and talent required to make or modify an AI seems substantially higher. (Epistemic status: I know about the porn industry mostly from teaching human sexuality classes for 20 years, and lecturing about the academic psychology research concerning it; I'm not an expert on its economics.)

If porn was more like AI, and required significant investment capital (e.g. a tens of millions of dollars, rather than tens of thousands), if it required recruiting and managing several smart and skilled developers, if it required access to cloud computing resources, and if it required long-term commercial property rental, it seems like there are lot more chokepoints where moral stigmatization could slow down AI progress.

But it's certainly worth doing some compare-and-contrast studies of morally stigmatized industries (which might include porn, sex work, guns, gambling, drugs, etc).

Cybercrime probably has somewhat higher barriers to entry than porn (although less than creating an AGI) and arguably higher levels of stigma. It doesn't take as much skill as it used to, but still needs skilled actors at the higher levels of complexity. Yet it flourishes in many jurisdictions, including with the acquiescence (if not outright support) of nation-states. So that might be another "industry" to consider.

Jason - yes, that's another good example. 

I suspect there will also be quite a bit of overlap between cybercrime and advanced AI (esp. for 'social engineering' attacks) in the coming years. Just as crypto's (media-exaggerated) association with cybercrime in the early 2010s led to increased stigma against crypto, any association between advanced AI and cybercrime might increase stigma against AI.

I believe PornHub is a bigger company than most of today's AI companies (~150 employees, half software engineers according to Glass Door)? If Brave AI is to be believed, they have $100B in annual revenue and handle 15TB of uploads per day.

If this is the benchmark for the limits of an AI company in a world where AI research is stigmatized, then I am of the opinion that all that stigmatization will accomplish is to make it so people who are OK working in the dark get to make decisions on what gets built. I feel like PornHub sized companies are big enough to produce AGI.

I agree with you that Porn is a very distributed industry overall, and I do suspect that is partially because of the stigmatization. However, this has resulted in a rather robust organization arrangement where individuals work independently and these large companies (like PornHub) focus on handling the IT side of things.

In a stigmatized AI future, perhaps individuals all over the world will work on different pieces of AI stuff while a small number of big AI companies perhaps do bulk training or coordination. Interestingly, this sort of decentralized approach to building could result in a better AI outcome because we wouldn't end up with a small number of very powerful people deciding trajectory, and instead would have a large number of individuals working independently and in competition with each other.

I do like your idea about comparing to other stigmatized industries! Gambling and drugs are, of course, other great examples of how an absolutely massive industry can grow in the face of weak stigmatization!

Micah - very interesting points.

The PornHub example raises something a lot of people seem not to understand very well about the porn industry. PornHub and its associated sites (owned by MindGeek) are 'content aggregators' that basically act as free advertising for the porn content produced by independent operators and small production companies -- which all make their real money through subscription services. PornHub is a huge aggregator site, but as far as I know, it doesn't actually produce any content of its own. So it's quite unlike Netflix in this regard -- Netflix spent about $17 billion in 2022 on original content, whereas PornHub spent roughly zero on original content, as far as I can tell.

So, one could imagine 'AI aggregator sites' that offer a range of AI services produced by small independent AI developers. These could potentially compete with Big Tech outfits like OpenAI or DeepMind (which would be more analogous to Netflix, in terms of investing large sums in 'original content', i.e. original software).

But, whether that would increase or decrease AI risk, I'm not sure. My hunch is that the more people and organizations who are involved in AI development, the higher the risk that a few bad actors will produce truly dangerous AI systems, whether accidentally or deliberately. But, as you say, a more diverse AI ecosystem could reduce the change that a few big AI companies acquire and abuse a lot of power.

Thanks for writing this, I think both of your claims are important to think through carefully. The downsides of stopping progress are high, and so is the risk for accidentally causing more harm through taking careless actions to that end. 

I think your arguments on the second claim are stronger, and I'm not sure to what extent serious considerations have been put into generally understanding such interventions seeking to halt AI progress. 

Curated and popular this week
LintzA
 ·  · 15m read
 · 
Cross-posted to Lesswrong Introduction Several developments over the past few months should cause you to re-evaluate what you are doing. These include: 1. Updates toward short timelines 2. The Trump presidency 3. The o1 (inference-time compute scaling) paradigm 4. Deepseek 5. Stargate/AI datacenter spending 6. Increased internal deployment 7. Absence of AI x-risk/safety considerations in mainstream AI discourse Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on. In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments!  Implications of recent developments Updates toward short timelines There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is that 99% of fully-remote jobs will be automatable in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months). We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achi
Dr Kassim
 ·  · 4m read
 · 
Hey everyone, I’ve been going through the EA Introductory Program, and I have to admit some of these ideas make sense, but others leave me with more questions than answers. I’m trying to wrap my head around certain core EA principles, and the more I think about them, the more I wonder: Am I misunderstanding, or are there blind spots in EA’s approach? I’d really love to hear what others think. Maybe you can help me clarify some of my doubts. Or maybe you share the same reservations? Let’s talk. Cause Prioritization. Does It Ignore Political and Social Reality? EA focuses on doing the most good per dollar, which makes sense in theory. But does it hold up when you apply it to real world contexts especially in countries like Uganda? Take malaria prevention. It’s a top EA cause because it’s highly cost effective $5,000 can save a life through bed nets (GiveWell, 2023). But what happens when government corruption or instability disrupts these programs? The Global Fund scandal in Uganda saw $1.6 million in malaria aid mismanaged (Global Fund Audit Report, 2016). If money isn’t reaching the people it’s meant to help, is it really the best use of resources? And what about leadership changes? Policies shift unpredictably here. A national animal welfare initiative I supported lost momentum when political priorities changed. How does EA factor in these uncertainties when prioritizing causes? It feels like EA assumes a stable world where money always achieves the intended impact. But what if that’s not the world we live in? Long termism. A Luxury When the Present Is in Crisis? I get why long termists argue that future people matter. But should we really prioritize them over people suffering today? Long termism tells us that existential risks like AI could wipe out trillions of future lives. But in Uganda, we’re losing lives now—1,500+ die from rabies annually (WHO, 2021), and 41% of children suffer from stunting due to malnutrition (UNICEF, 2022). These are preventable d
Rory Fenton
 ·  · 6m read
 · 
Cross-posted from my blog. Contrary to my carefully crafted brand as a weak nerd, I go to a local CrossFit gym a few times a week. Every year, the gym raises funds for a scholarship for teens from lower-income families to attend their summer camp program. I don’t know how many Crossfit-interested low-income teens there are in my small town, but I’ll guess there are perhaps 2 of them who would benefit from the scholarship. After all, CrossFit is pretty niche, and the town is small. Helping youngsters get swole in the Pacific Northwest is not exactly as cost-effective as preventing malaria in Malawi. But I notice I feel drawn to supporting the scholarship anyway. Every time it pops in my head I think, “My money could fully solve this problem”. The camp only costs a few hundred dollars per kid and if there are just 2 kids who need support, I could give $500 and there would no longer be teenagers in my town who want to go to a CrossFit summer camp but can’t. Thanks to me, the hero, this problem would be entirely solved. 100%. That is not how most nonprofit work feels to me. You are only ever making small dents in important problems I want to work on big problems. Global poverty. Malaria. Everyone not suddenly dying. But if I’m honest, what I really want is to solve those problems. Me, personally, solve them. This is a continued source of frustration and sadness because I absolutely cannot solve those problems. Consider what else my $500 CrossFit scholarship might do: * I want to save lives, and USAID suddenly stops giving $7 billion a year to PEPFAR. So I give $500 to the Rapid Response Fund. My donation solves 0.000001% of the problem and I feel like I have failed. * I want to solve climate change, and getting to net zero will require stopping or removing emissions of 1,500 billion tons of carbon dioxide. I give $500 to a policy nonprofit that reduces emissions, in expectation, by 50 tons. My donation solves 0.000000003% of the problem and I feel like I have f