The Slippery Slope from DALLE-2 to Deepfake Anarchy

philljkc

OpenAI developed DALLE-2. Then StabilityAI made an open source copycat. This is a dangerous dynamic.

Stephen Casper (scasper@mit.edu)

Phillip Christoffersen (philljkc@mit.edu)

Rui-Jie Yew (rjy@mit.edu)

Thanks to Tan Zhi-Xuan and Dylan Hadfield-Menell for feedback.

This post talks about NSFW content but does not contain any. All links from this post are SFW.

Abstract

Since OpenAI published their work on DALLE-2 (an AI system that produces images from text prompts) in April, several copycat text-to-image models have been developed including StabilityAI’s Stable Diffusion. Stable Diffusion is open-source and can be easily misused, including for the almost-effortless development of NSFW images of specific people for blackmail or harassment. We argue that OpenAI and StabilityAI’s efforts to avoid misuse have foreseeably failed and that both share responsibility for harms from these models. And even if one is not concerned about issues specific to text-to-image models, this case study raises concerns about how copycatting and open-sourcing could lead to abuses of more dangerous systems in the future.

To reduce risks, we discuss three design principles that developers should abide by when designing advanced AI systems. Finally we conclude that (1) the AI research community should curtail work on risky capabilities–or at the very least more substantially vet released models (2) the AI governance community should work to quickly adapt to heightened harms posed by copycatting in general and text-to-image models in particular, and (3) public opinion should ideally not only be critical of perpetrators for harms that they cause with AI systems, but also originators, copycatters, distributors, etc. who enable them.

What’s wrong?

Recent developments in AI image generation have made text-to-image models very effective at producing highly realistic images from captions. For some examples, see the paper from OpenAI on their DALLE-2 model or the release from Stability AI of their Stable Diffusion model. Deep neural image generators like StyleGan and manual image editing tools like Photoshop have been on the scene for years. But today, DALLE-2 and Stable Diffusion (which is open source) are uniquely effective at rapidly producing highly-realistic images from open-ended prompts.

There are a number of risks posed by these models, and OpenAI acknowledges this. Unlike conventional art and Photoshop, today’s text-to-image models can produce images from open-ended prompts by a user in seconds. Concerns include (1) copyright and intellectual property issues (2) sensitive data being collected and learned (3) demographic biases, e.g. producing images of women when given the input, “an image of a nurse” (4) using these models for disinformation by creating images of fake events, and (5) using these models for producing non-consensual, intimate deepfakes.

These are all important, but producing intimate deepfakes is where abuse of these models seems to be the most striking and possibly where we are least equipped to effectively regulate misuse. Stable Diffusion is already being used to produce realistic pornography. Reddit recently banned several subreddits dedicated to AI-generated porn including r/stablediffusionnsfw, r/unstablediffusion, and r/porndiffusion for a violation of Reddit’s rules against non-consensual intimate media.

This is not to say that violations of sexual and intimate privacy are new. Before the introduction of models such as DALLE-2 and Stable Diffusion, individuals have been victims of non-consensual deepfakes. Perpetrators often make this content to discredit or humiliate people from marginalized groups, taking advantage of the negative sociocultural attitudes that already surround them. An estimated 96% of deepfake videos online are porn, almost all featuring women. In one case, when a video of a journalist committing a sex act she never did went viral on the Internet, she was met with death threats. Her home address was leaked alongside false advertisements that she was available for sex. She could not eat, and she stopped writing for months. Other forms of sexual privacy violations, have had similar consequences for victims, leading to economic injuries from damaged reputations in job searches and even to suicide.

The unique danger posed by today’s text-to-image models stems from how they can make harmful, non-consensual content production much easier than before, particularly via inpainting and outpainting, which allows a user to interactively build realistic synthetic images from natural ones, dreambooth, or other easily used tools, which allow for fine-tuning on as few as 3-5 examples of a particular subject (e.g. a specific person). More of which are rapidly becoming available following the open-sourcing of Stable Diffusion. It is clear that today’s text-to-image models have uniquely distinct capabilities from methods like Photoshop, RNNs trained on specific individuals or, “nudifying” apps. These previous methods all require a large amount of subject-specific data, human time, and/or human skill. And no, you don’t need to know how to code to interactively use Stable Diffusion, uncensored and unfiltered, including in/outpainting and dreambooth.

If Photoshop is like a musket, Stable Diffusion is like an assault rifle. And we can expect that issues from the misuse of these models will only become more pressing over time as they get steadily better at producing realistic content. Meanwhile, new graphical user interfaces will also make using them easier. Some text-to-video models are even beginning to arrive on the scene from Meta and Google. And a new version of Stable Diffusion will be released soon. New applications, capabilities, and interfaces for diffusion models being released daily. So to the extent that it isn’t already easy, it will become easier and easier for these models to be tools for targeted harassment.

Unfortunately, current institutions are poorly equipped to adapt to increased harms in a way that protects those who are the most vulnerable. Concerted political action and research is often focused on the capacity for deepfakes to spread misinformation. This makes sense in light of how those in positions of political power stand to be affected the most by deepfake news. On the other hand, a combination of lack of oversight and sociocultural attitudes has often led victims of deepfake sex crimes to be met with indifference–law enforcement has told victims to simply “go offline”.

But even if one does not view the risks specific to text-to-image models as a major concern, the fact that these models have quickly become open-source and easily-abused does not bode well for the arrival of more capable AI systems in the future. There has been a slippery slope from DALLE-2’s release to today’s environment where Stable Diffusion can be easily used to cause devastating harm to people. And this offers a worrying case study on the difficulty of keeping risky AI capabilities out of the control of people who will misuse them.

How did we get here?

On April 13, 2022, OpenAI released the paper on DALLE-2 which set the state of the art for producing realistic images from text. Along with the release of the paper, OpenAI also created a website that allows users to query the model for image generation and editing. OpenAI did a great deal of work to avoid misuse of the model and wrote an extensive technical report on it. Their measures include (1) curating training data to avoid offensive content, (2) testing their own model for issues, (3) having an independent red team try to find problems with it, (4) not releasing the architecture or weights, (5) requiring users to sign up, provide an email, and explain their motivations to use the model, (6) having a waiting period for access (although a waiting period was no longer required as of late September 2022), (7) filtering prompts from users that contained explicit content, famous peoples’ names, etc, (8) filtering images from the model, (9), suspending/banning users who enter too many suspicious prompts, and (10), continually updating their backend to respond to issues.

To their credit, these measures seem to have been very successful at preventing the use of DALLE-2 for creating offensive content. We have seen anecdotal posts on Reddit from users who have reportedly tried and failed to generate porn with DALLE-2 using crafty prompts like “transparent swimsuit” to no avail, often getting banned in the process. We are not aware of any clearly successful examples of anyone getting DALLE-2 to produce particularly offensive content at all, much less systematically.

So what’s the problem? Despite all of OpenAI’s efforts to avoid misuse of DALLE-2, they still provided the proof of concept for this type of model, and they still wrote about many of the details to their approach in their paper. This enabled others to fund and develop copycat models which can be more easily misused. OpenAI’s technical report on risks had no discussion of problems from copycat models other than a cursory mention that “DALLE-2…may accelerate both the positive and negative uses associated with generating visual content”. It seems odd that OpenAI did not meaningfully discuss copycats given the thoroughness of the report and the fact that past systems of theirs such as GPT-3 have also been copycatted before (e.g. BLOOM).

Two notable DALLE-2 copycats are Midjourney and eDiffi. But most relevant to this case study is Stable Diffusion from StabilityAI. StabilityAI is a startup whose homepage says it is “a company of builders who care deeply about real-world implications and applications.” It was founded in 2020 but came into the spotlight only recently upon entering the image generation scene. For example, it only made a Twitter account in July, a few months after the DALLE-2 paper. Their copycat, Stable Diffusion, is comparable to DALLE-2, and they confirmed that it was a principal source of inspiration.

Relative to OpenAI, StabilityAI did a very poor job of preventing misuse of Stable Diffusion. In August, they announced the model would be open-sourced. This release was accompanied with some mostly-ineffective measures to reduce harms from the model like providing a safety classifier for images which both doesn’t work that well and which users can simply disable. They also tried to restrict access to people who signed up for it with an email and provided a justification. The plan was to make Stable Diffusion available via HuggingFace on August 22, 2022 to those who were approved for access. This mattered very little though because the weights were leaked online a few days earlier. Then predictably, people either used the model directly or finetuned versions of it to produce the type of offensive content that can be used for targeted harassment of individuals. Later in September, HuggingFace also made access to Stable Diffusion available for anyone on the internet with no signup, albeit with automated filtering for NSFW content built into this particular interface.

Overall, the slippery slope from the carefully-guarded DALLE-2 to the fully-open-source Stable Diffusion took less than 5 months. On one hand, AI generators for offensive content were probably always inevitable. However (1) not this soon. Delays in advancements like these increase the chances that regulation and safety work won’t be so badly outpaced by capabilities. (2) Not necessarily in a way that was enabled by companies like OpenAI and StabilityAI who made ineffective efforts to avoid harms yet claim to have clean hands while profiting greatly off these models. And (3) other similar issues with more powerful models and higher stakes might be more avoidable in the future. What will happen if and when video generators, GPT-N, advanced generalist agents, or other potentially very impactful systems are released and copycatted?

What do we want?

Even in light of this case study, one need not be entirely against the release of AI systems to the public. The democratization of ML research is among its strongest assets. However, there are general principles that any AI system, with any degree of public exposure, ought to obey. Specifically, we propose three design principles as necessary conditions for responsible development of such systems. We would hope that principles like these can guide not only the development process for AI systems, but also provide a set of standards that AI systems must meet to be enforced by research communities, governments, and even the public (more on that below). The key theme is that companies should ideally be accountable not just for what their AI does in narrow use cases they say it should be used in, but for all of the foreseeable consequences of the systems they release.

Scoping of function

Both the power and risk of general-purpose AI systems lies in their broad applicability. General-purpose text, image, or video generation could conceivably be used in a wide array of contexts, making it much harder to reason about their safety and impact. Therefore, it is useful to more precisely scope down technologies so that safety assurances can more readily be given. Note, however, that scoping of function requires that this scope is fixed. In other words, it means ensuring that, once an AI system is scoped down, it is not meaningfully usable for other purposes. For instance, a language model released for a relatively harmless purpose, should not be easily hackable or fine-tuneable in order to do another more harmful one.

Simple examples of implementing this principle could include only developing narrow versions of powerful models, either fine-tuned on narrower data, or including a penalty for all out-of-scoped outputs in the training objective. This allows fulfillment of the scientific goals of releasing such models (i.e. demonstrating strong capability over a specified domain) while making misuse more difficult. More ambitiously, developers ought to more explicitly enforce scope throughout the training and development of models, which can be done at the dataset level (e.g. more extensively curating data), or at the architectural level (e.g. models that are more interpretable or able to provide stronger guarantees on the kind of output they produce). In any event, the more one is able to confidently scope down the behavior of a system, the more certain they can be about any given claims to safety.

Limitations for access

The details about how a system is released can be just as important as properties of the system itself. For example, if one has a general-purpose model, but somehow the outputs can be limited in usage outside of direct interaction with the model, this would limit its negative impacts.

This could include things as simple as forbidding screenshotting or copy-pasting of outputs from APIs. It could also include measures like filters on prompts or outputs. Stronger versions of this, for particularly capable technologies, might include restricting the set of people who can access these models or keeping as much about the model secret as possible in order to slow efforts at replication. Even if some of these measures are circumventable with effort, they may be able to meaningfully hinder abuses by adding friction.

This requirement is strong, and not at all an afterthought that can be tacked on after development. For example, as mentioned above, deployed models (e.g. DALLE-2) often come with some restrictions for access. However, even just publishing the training details of these models makes them replicable and therefore totally annuls any other effort to scope access. In this respect, the weakest access link determines how accessible a given AI technology system is. For this reason, whenever developing or deploying a model, one must thoroughly consider how accessible it truly is.

Complete cost/benefit analysis

Even if a system is reasonably-scoped in function and access, given state of the art techniques in AI, it ultimately remains very hard to totally rule out potential abuses. Therefore, since such abuses are to some degree an inherent risk, it is incumbent on the creators of such systems to clearly articulate the set of all possible costs and benefits of their models. They should also faithfully argue why those benefits outweigh these costs, why it is worth the inherent risk of deployment.

Especially in the case of DALLE-2 and Stable Diffusion, we are not convinced of any fundamental social benefits that access to general-purpose image generators provide aside from, admittedly, being entertaining. But this does not seem commensurate with the potentially-devastating harms that deepfakes can have on victims of sex crimes. Thus, it seems that these models, as they have been rolled out, fail this basic cost/benefit test.

If OpenAI, StabilityAI, and others could more readily comply with the above desiderata for safety, it would go a long way to mitigating the negative impact of releasing powerful AI tools. However, if any of these are not addressed properly (cannot even in principle be addressed properly for a given technology), releasing such models is dangerous and should be criticized. This becomes even more pressing as the capability and deployment scope of AI systems grows.

What should we do?

The role of AI researchers

Researchers should take great care in what they work on and how they release it. The best solutions to avoiding harms from copycat models may be to (1) curtail advanced capabilities work in general such as DALLE-2, the soon-to-be-released GPT-4, or video generators and (2) investing in work that incorporates harm mitigation at a systems-level and in infrastructure for recourse. Even if no details at all are provided about how something was done, simply knowing that it can be done makes copycats easier to fund and work on. Proofs of concept make the choice to invest in building a technology more cost-efficient. Rather than working on models with broad-domain capabilities like DALLE-2 and GPT-4, non-capabilities work or models with narrow capabilities seem to be safer directions for work. An exception to this would be if certain progress on risky capabilities is inevitable within a certain timeframe and if the only choice is between less dangerous and more dangerous models. And as discussed above, those who build abusable systems should carefully scope their function, limit access, and honestly articulate costs and benefits.

Meanwhile, there are deep problems with the “let’s build transformative AI in order to make sure it’s safe” strategy. In particular, OpenAI and DeepMind both express that they want to race to generate highly transformative intelligent systems. The goal they both profess is to be the first to develop them so that they can exercise responsible stewardship and ensure that it is as aligned and beneficial as possible. This is a benevolent form of what Nick Bostrom refers to in Superintelligence as gaining a “decisive strategic advantage” which may make the first developer of particularly transformative AI too powerful to compete with. There are many problems with this strategy including: (1) It is entirely based on racing to develop transformative AI, and faster timelines exacerbate AI risks. This is especially perverse if multiple actors are competitively racing to do so. (2) Nobody should trust a small set of people like Sam Altman and Demis Hassabis to unilaterally exercise benevolent stewardship over transformative AI. Arguably, under any tenable framework for AI ethics, a regime in which a small technocratic set of people unilaterally controlled transformative AI would be inherently unethical. Meaningful democratization is needed. (3) OpenAI’s approach to DALLE-2 should further erode confidence in them in particular. Their overly-convenient technical report on risks that failed to make any mention of copycatting combined with how quickly they worked to profit off of DALLE-2 are worrying signs. (4) Copycatting makes racing to build transformative AI strictly more risky. Even if one fully-trusted a single actor like OpenAI or DeepMind to exercise perfect stewardship over transformative AI if they monopolized it, how quickly DALLE-2 was copycatted multiple times suggests that copycatting may undermine attempts at benevolent strategic dominance. Copycatting would most likely serve to broaden the set of technocrats who control transformative AI but still fail to democratize it. So if a company like OpenAI or DeepMind races to build transformative AI, and if it is still copycatted anyway, we get the worst of all worlds: unsecure, non-democratized, transformative AI on a faster timeline. If a similar story plays out with powerful, highly transformative AI as has with DALLE-2, humanity may be in trouble.

The role of AI governance

The slippery slope from DALLE-2 to text-to-image model anarchy demonstrates a rapid pathway for this technology from originators (e.g. OpenAI), to copycatters (e.g. StabilityAI with the help of platforms like Huggingface and Github for sharing models), to distributors of content (e.g social media). Ideally, the norms and laws around AI governance should recognize these steps and work to add friction where possible. In this section, we provide considerations in the regulation of this pipeline.

The status quo: terms of use, content policies, and applications for access

When releasing DALLE-2, OpenAI published corresponding terms of use and content policy, as well as a process to review requests for access. This is not uncommon for AI technologies. Originators of open source datasets and codebases have looked towards licensing, terms of use, and applications for access as mechanisms to define approved uses of datasets and codebases ex ante. However, Peng et al. finds that many datasets that were originally released under a non-commercial license were re-released as part of derivative creations under a commercial license. Peng et al. also points out regulatory loopholes: while datasets may be governed or distributed under a particular license, pre-trained models often don’t face the same restrictions. This means that those who may be excluded under licenses may still reap some of its benefits through models trained on those images.

Despite the clear influence of DALLE-2 on Stable Diffusion, StabilityAI did not have to agree to OpenAI’s terms of use or content policy, nor did they have to apply for access. Nor does anyone who uses Stable Diffusion. As distribution grows, responsibility for harmful aspects of distribution becomes more diffused. The lack of threads of responsibility for both setting and maintaining licenses means that there are few consequences for not adhering to terms of use.

Governance of originators

While originators and copycatters play a similar role by developing technologies, it can be useful to treat them as distinct for a few reasons. The first is that origination is much more difficult than copycatting, so originators represent a smaller and more easily-targeted bottleneck in the pipeline. Second, originators tend to have and require more resources than copycatters. Originators such as OpenAI and DeepMind have consistently advanced the state of the art with language, image generation, and reasoning capabilities using immense computational resources and talent. Third, originators also have a huge say over how these technologies are proliferated. For example, OpenAI required and reviewed applications for access to DALLE-2 and DeepMind released AlphaFold open-source. This points to not only the resources these firms have in knowledge generation and system creation, but also their influence in granting broader access to these technologies. Then, with the large number of resources and the impact their decisions can have, originators make a natural regulatory target.

In recent years, the Federal Trade Commission (FTC) in the United States has exercised a useful role in addressing the consequences of harmful AI systems. The legal community has recognized the FTC's breadth of governance as potentially granting the commission the authority to regulate a broad range of harms caused by AI systems. The FTC also has the authority to seek an injunction to order for a company to cease a certain practice. This is a valuable regulatory power in cases where firms have strong incentives to engage in undesirable behavior, and where non-government institutions don’t have the necessary authority to intervene (e.g. if deploying AI which is unsafe turns out to be massively profitable).

Previously, the FTC has intervened in the deployment of digital technology when enforcing privacy and data security regulation. When Google launched its social network Google Buzz in 2010, the company automatically added Gmail users’ frequent contacts as part of their visible network. This leaked sensitive information about users’ doctors and intimate contacts. The FTC classified this move on Google’s part as a deceptive practice, and Google entered into a consent decree with the FTC. As a result, Google was required to implement a comprehensive internal privacy program and be subject to regular, independent privacy audits.

Even beyond organizational changes to corporations, the FTC has also ordered the remaking of software to better align with legal values. In a case against Google for violation of the Children’s Online Privacy Protection Act, the FTC settlement required Google to change the way YouTube operated. YouTube now requires content creators to mark whether their content is directed at children. If that is the case, the platform no longer tracks identifiers or serves behavioral advertisements. Similarly, regulatory pressure may be able to push OpenAI and StabilityAI to change the way their software is designed.

Unfortunately, there are several drawbacks. In YouTube’s settlement with the FTC, Commissioner Slaughter notes that, even though YouTube requires content creators to mark whether their content is directed at children, the incentives of content creators are often aligned with YouTube’s business interests. Content creators may also have much to gain from YouTube’s advertising. And, even though Google voluntarily announced that they would apply machine learning to actively search for mis-designated content, the results and applications of this effort would be opaque.

Moreover, these FTC authorities are usually invoked ex post or as part of a post-suit agreement. This means that harms have to be uncovered and recognized as important enough for the FTC to go after the entities that caused them. The resource-intensity of uncovering failure modes of emerging technologies could dissuade the FTC from going after these companies, and could even encourage companies themselves to look away from potential harms to limit liability. Moreover, in Google Buzz’s FTC case, the only part of Google’s practice that made it deceptive in the eyes of the FTC was that Google had a comprehensive privacy policy that made promises about its behavior to its users. OpenAI and Stability AI have content policies for their models, but they place the onus on users to generate appropriate content, and the terms of use similarly make no promises for the behavior of the models. Baseline expectations of safety and harm mitigation should therefore be demanded ex ante of developers of capable AI systems, beyond simply what is promised to users. Expecting these basics must become commonplace, and violations must be enforced.

Governance of copycatters

The diffusion of responsibility as the models are reproduced and distributed by copycatters may point to a need to conduct research in the detection and handling of copycatting, perhaps, in ways inspired by YouTube’s Copyright Match Tool and GitHub’s code search and inference capabilities. But it may simply be the case that originators do not have an incentive to enforce the strict licensing of pre-trained models. When actors upload copyrighted material onto YouTube, original content creators are driven off the platform. The distribution of powerful pre-trained models, on the other hand, may actually contribute to hype and bolster the reputation of these technologies.

Governance of distributors

As discussed above, non-consensual intimate images have existed before text-to-image models. Broadly speaking, we reaffirm the arguments presented by University of Virginia Law Professor Danielle Citron in The Fight for Privacy. These focus on how law can remove hurdles for victims to seek recourse and how law can create incentives for distributors to provide victims with what they need. To remove barriers in going to court, Citron argues in favor of allowing victims to sue pseudonymously, without releasing their full name to the broader public. To provide victims with meaningful remedy, Citron advocates for granting injunctive relief in court cases through court orders “directing the removal, blocking, or de-linking of intimate images” by platforms. It is also worth noting that a few U.S state policies now criminalize the disclosure of deepfake intimate images and videos, creating the potential for new case law specific to text-to-image models as their capabilities continue to evolve.

By empowering victims to seek recourse, the governance and regulation of distributors as a whole becomes more robust. As Florida State University Law Professor Lauren Scholz writes, “private enforcement deters potential wrongdoers by allowing for a resilient avenue of enforcement, available even when agency funding or political will is lacking”.

However, we note that many of the tools that can be wielded at this stage are invariably wielded to conduct damage control after the harm has already been done. Therefore, combining the particulars of the DALLE-2 case with knowledge about existing regulation, we note the importance of managing regulation upstream from this point in the process. Many open questions remain in pursuit of this. How might law and policy create incentives for platforms, from Meta to GitHub, to proactively detect harmful deepfakes before they are published? How can we move technology and law beyond content moderation and shift discourse towards restorative justice that acknowledges that platforms are moderating human relationships, not simply disconnected pieces of information in the ether? In doing so, AI governance does not need to start from scratch, but this will require substantial innovation on existing frameworks.

The role of the public

Organizations like OpenAI, StabilityAI, other copycatters, and HuggingFace all share responsibility for harms from text-to-image models and should be viewed critically by the public. Incomplete and ineffective attempts to safeguard models with risky capabilities should not allow these companies to argue that their hands are clean. Given their recent history with pushing the state of the art for text and image modeling, OpenAI in particular should receive pushback for copycattable work such as DALLE-2 and the GPT models. It is also noteworthy that HuggingFace is now a repeat offender for making risky models easily-available. Earlier this year, they temporarily hosted the weights of GPT-4chan which generates hate speech and other offensive text.

The AI research and governance communities may only be able to do so much to curb the spread of easily-abused AI systems. The last best line of defense against them might be for actors who can influence public opinion to spread disapproval for organizations who enable harmful uses of AI. For example, the comedy/commentary show Last Week Tonight recently made a piece on AI images which only highlighted fun and positive uses of them. This is unfortunate because this platform might have been (and might still be) an excellent one to inform the public about present and future risks from models with dangerous capabilities. This type of publicity can, in turn, meaningfully shape policy and influence companies.

Conclusion

With text-to-image models, the Pandora's box is already opened. It is extremely easy to abuse Stable Diffusion, and it will get easier over time. Some people, particularly victims of sex crime, will be devastatingly harmed while OpenAI and StabilityAI make large profits. This offers a compelling case study on risks from text-to-image models in particular and the proliferation of risky models in general. As a result, it is important to study and adapt to these challenges in order to guide AI progress in safer directions. Since the capabilities and scope of AI systems will only increase with time, promoting a healthier AI ecosystem will require prescient action from researchers, governance bodies, and the public.

55 Reactions

More posts like this

Comments11

Sorted by

New & upvoted

Click to highlight new comments since: Today at 2:13 PM

Matthew_BarnettNov 6 202221

I'm confused about this post. I don't buy the argument, but really, I'm not sure I understand what the argument is. Text-to-image models have risks? Every technology has risks. What's special about this technology that makes the costs outweigh the benefits?

Consider this paragraph,

Especially in the case of DALLE-2 and Stable Diffusion, we are not convinced of any fundamental social benefits that access to general-purpose image generators provide aside from, admittedly, being entertaining. But this does not seem commensurate with the potentially-devastating harms that deepfakes can have on victims of sex crimes. Thus, it seems that these models, as they have been rolled out, fail this basic cost/benefit test.

I don't find anything about this obvious at all.

Suppose someone in the 1910s said, "we are not convinced of any fundamental social benefits that film provides aside from, admittedly, being entertaining. But this does not seem commensurate with the potentially-devastating harms that film can cause by being used as a vehicle for propaganda. Thus, it seems that films, as they have been rolled out, fail this basic cost/benefit test." Would that have convinced you to abandon film as an art form?

Entertainment has value. Humanity collectively spends hundreds of billions of dollars every year on entertainment. People would hate to live in a world without entertainment. It would be almost dystopian. So, what justifies this casual dismissal of the entertainment value of text-to-image models?

Your primary conclusion is that "the AI research community should curtail work on risky capabilities". But isn't that obvious? Everyone should curtail working on unnecessarily risky capabilities.

The problem is coordinating our behavior. If OpenAI decides not to work on it, someone else will. What is the actual policy being recommended here? Government bans? Do you realize how hard it would be to prevent text-to-image models from being created and shared on the internet? That would require encroaching on the open internet in a way that seems totally unjustified to me, given the magnitude of the risks you've listed here.

What's missing here is any quantification of the actual harms from text-to-image models. Are we talking about 100 people a year being victimized? That would indeed be sad, but compared to potential human extinction from AI, probably not as big of a deal.

[anonymous]Nov 7 20226

Thanks for the comment. I hope you think this is interesting content.

I'm not sure I understand what the argument is.

The most important points we want to argue with this post are that (1) if a system itself is made to be safe, but it's copycatted and open-sourced, then the safety measures were not effective (2) it is bad when developers like OpenAI publish incomplete/overly-convenient analysis of the risks of what they develop that, for example, ignore copycatting, and (3) the points from "What do we want"? and "What should we do?"

"...we are not convinced of any fundamental social benefits that film provides aside from, admittedly, being entertaining..."

Yes entertainment has value, but I don't think that entertainment from text-to-image models is/will be commensurate with film. I could also very easily list a lot of non-entertainment uses of film involving stuff like education, communication, etc. And I think someone from 1910 could easily think of these as well. What stuff like this would you predict from text-to-image diffusion models?

So, what justifies this casual dismissal of the entertainment value of text-to-image models?

We don't. We argue that it's unlikely to outweigh harms.

Your primary conclusion is that "the AI research community should curtail work on risky capabilities".

I wouldn't say this is our primary conclusion. See my first response above. Also, I don't think this is obvious. Sam Altman, Demis Hassabis, and many others strongly disagree with this.

The problem is coordinating our behavior. If OpenAI decides not to work on it, someone else will

We disagree that the counterfactual to OpenAI not working on some projects like DALLE2 or GPT4 would be similar to the status quo. We discussed this in the paragraph that says "...On one hand, AI generators for offensive content were probably always inevitable. However..."

...Government bans? Do you realize how hard it would be to prevent text-to-image models from being created and shared...

Yes. We do not advocate for government bans. My answer to this is essentially what we wrote in the "The role of AI governance." I don't have much to add beyond what we already wrote. I recommend rereading that section. In short, there are regulatory tools that can be used. For example, the FTC may have a considerable amount of power in some cases.

Are we talking about 100 people a year being victimized? That would indeed be sad, but compared to potential human extinction from AI, probably not as big of a deal.

Where did the number 100 come from? In the post, we cite one article about a study from 2019 that found ~15,000 deepfakes online. That was in 2019 when image and video generation were much less developed than today. And in the future, things may be much more widespread because of open-source tools based on SD that are easy to use.

Another really important point, I think, is that we argue in the post that trying to avoid dynamics involving racing toward TAI, copycatting, and open-sourcing of models will LESSEN X-risk. You wrote your comment as if we are trying to argue that preventing sex crimes are more important than X-risk. We don't say this. I recommend rereading the "But even if one does not view the risks specific to text-to-image models as a major concern..." paragraph and the "The role of AI researchers" section.

Finally, and I want to put a star on this point -- we all should care a lot about sex crime. And I'm sure you do. Writing off problems like this by comparing them to X-risk (1) isn't valid in this case because we argue for improving the dev ecosystem to address both of these problems, (2) should be approached with great care and good data if it needs to be done, and (3) is one type of thing that leads to a lot of negativity and bad press about EA.

I think this is probably even more true for your comments on entertainment value and whether that might outweigh the harms of deepfake sex crimes. First, I'm highly skeptical that we will find uses for text-to-image models that are so widely usable and entertaining that it would be commensurate to the harms of diffusion-deepfake sex crime. But even if we could be confident that entertainment would hypothetically outweigh sex crimes on pure utilitarian grounds, in the real world with real politics and EA critics, I do not think this position would be tenable. It could serve to undermine support for EA and end up being very negative if widespread.

Rohin ShahNov 12 20224

But even if we could be confident that entertainment would hypothetically outweigh sex crimes on pure utilitarian grounds, in the real world with real politics and EA critics, I do not think this position would be tenable.

Isn't this basically society's revealed position on, say, cameras? People can and do use cameras for sex crimes (e.g. voyeurism) but we don't regulate cameras in order to reduce sex crimes.

I agree that PR-wise it's not a great look to say that benefits outweigh risks when the risks are sex crimes but that's because PR diverges wildly from reality. (And if cameras were invented today, I'd expect we'd have the same PR arguments about them.)

None of this is to imply a position on deepfakes -- I don't know nearly enough about them. My position is just that it should in fact come down to a cost/benefit calculation.

I could also very easily list a lot of non-entertainment uses of film involving stuff like education, communication, etc.

Random nitpick, but text-to-image models seem plausibly very useful for education and communication. I would love for people's slide decks with pages and pages of text to be replaced by images that convey the same points better. Maybe imagine Distill-like graphics / papers, except that it no longer takes 5x as long to produce them relative to a normal paper.

philljkcNov 15 20221

We agree for sure that cost/benefit ought be better articulated when deploying these models (see the What Do We Want section on Cost-Benefit Analysis). The problem here really is the culture of blindly releasing and open-sourcing models like this, using a Go Fast And Break Things mentality, without at least making a case for what the benefits are, what the harms are, and not appealing to any existing standard when making these decisions.

Again, it's possible (but not our position) that the specifics of DALLE-2 don't bother you as much, but certainly the current culture we have around such models and their deployment seems an unambiguously alarming development.

The text-to-image models for education + communication here seems like a great idea! Moreover, I think it's definitely consistent with what we've put forth here too, since you could probably fine-tune on graphics contained in papers related to your task at hand. The issue here really is that people are incurring unnecessary amounts of risk by making, say, an automatic Distill-er by using all images on the internet or something like that, when training on a smaller corpora would probably suffice, and vastly reduce the amount of possible risk of a model intended originally for Distill-ing papers. The fundamental position we advance that better protocols are needed before we start mass-deploying these models, and not that NO version of these models / technologies could be beneficial, ever.

philljkcNov 8 20222

I think the core takeaway, at least from my end, is that this post elucidates a model , and tells a more concrete story, for how proliferation of technologies of a certain structure and API (e.g., general-purpose query-based ML models) can occur, and why they are dangerous. Most importantly, this entails that, even if you don't buy the harms of DALLE-2 itself (which, we have established, you should, in particular for its potential successors), this pattern of origination -> copycatting -> distribution -> misuse is a typical path for the release of technologies like this. If you buy that a dangerous capability could ever be produced by an AI model deployable with an API of the form query -> behaviour (e.g. by powerful automatic video generation from prompts, powerful face-editing tools given a video, or an agent with arbitrary access to the internet controlled via user queries), this line of reasoning could therefore apply and/or be useful. This informs a few things:

Technologies, once proliferated, are like a Pandora's Box (or indeed, a slippery slope), and so therefore the very coordination problem / regulatory problem you speak of is most easily solved at the level of origination. This is a useful insight now, while many of the most dangerous AIs to be developed are yet to be originated.
The potential harms of these technologies come from their unbounded scope, i.e. from the generality of function, lack of restriction of user access, or from the parameter count of these models being so large as to make their behaviour inherently hard to reason with. All of these things make these kinds of models more particularly amenable to misuse. So this post, in my mind, also takes a view on the source of capabilities risk from these models: in their generality and open scope. This can therefore inform the kinds of models / training techniques that are more dangerous: e.g. that for which the scope is the widest, where most possible failures could happen because the right behaviour is more nebulously defined.

In general, I would urge you to consider this paragraph (in particular point (3)), the argument there seeming to be the bulk of your criticism.

Overall, the slippery slope from the carefully-guarded DALLE-2 to the fully-open-source Stable Diffusion took less than 5 months. On one hand, AI generators for offensive content were probably always inevitable. However (1) not this soon. Delays in advancements like these increase the chances that regulation and safety work won’t be so badly outpaced by capabilities. (2) Not necessarily in a way that was enabled by companies like OpenAI and StabilityAI who made ineffective efforts to avoid harms yet claim to have clean hands while profiting greatly off these models. And (3) other similar issues with more powerful models and higher stakes might be more avoidable in the future. What will happen if and when video generators, GPT-N, advanced generalist agents, or other potentially very impactful systems are released and copycatted?

In other words, it's maybe not as much about DALLE-2 itself, but about the extrapolation of this pattern to models like it, and ways to deal with that before a model with existential risk is brought up (and by that point, if the data is in on that, we're probably dead already).

Thanks for reading, and for the comment. I hope this clarifies the utility of this article for you.

SharmakeNov 9 20223

This. It suggests that once a powerful AI is released, even in restricted format, others will be motivated to copy it. This is a bad dynamic for AGI, as existential risk only depends on the least-safe actor. If this dynamic is repeated, it's very possible that we all die due to copies of AGI being released.

ChristianKleineidamNov 11 20222

The potential harms of these technologies come from their unbounded scope

Previous technologies also have quite unbounded scopes. That does not seem to me different from the technology of film. The example of film in the post you were replying too also has an unbounded scope.

This can therefore inform the kinds of models / training techniques that are more dangerous: e.g. that for which the scope is the widest

Technologies with a broad scope are more like to be dangerous but they are also more likely to be valuable.

If you look at the scope of photoshop it can be already used by people to make deepfake porn. It can also used by people to print fake money.

Forbidding broad-scope technologies to be deployed would have likely prevented most of the progress in the last century and would make a huge damper on future progress as well.

When it comes to gene editing, our society decides to regulate its application but is very open that developing the underlying technology is valuable.

The analogy to how we treat gene editing would be to pass laws to regulate image creation. The fact that deepfake porn is currently not heavily criminalized is a legislative choice. We could pass laws to regulate it like other sexual assaults.

Instead of regulating at the point of technology creation, you could focus on regulating technology use. To the extent that we are doing a bad job at that currently, you could build a think tank that lobbies for laws to regulate problems like deepfake porn creation and that constantly analysis new problems and lobbies for the to be regulated.

When it comes to the issue of deepfake porn, it's also worth looking why it's not criminalized. When Googling I found https://inforrm.org/2022/07/19/deepfake-porn-and-the-law-commissions-final-report-on-intimate-image-abuse-some-initial-thoughts-colette-allen/ which makes the case that it should be regulated but which cites a government report which suggests that deepfake porn creation should be legal while sharing it shouldn't be legal. I would support making both illegal, but I think approaching the problem from the usage point of view seem the right strategy.

philljkcNov 15 20221

When it comes to gene editing, our society decides to regulate its application but is very open that developing the underlying technology is valuable.

Here, I would refer to the third principle proposed in the "What Do We Want" section as well (on Cost-Benefit evaluation): I think that there should be at least more work done to try and anticipate / mitigate harms done by these general technologies. Like what is the rough likelihood of an extremely good outcome vs. extremely bad outcome for model X being deployed? If I add modification Y to it, does this change?

I don't think our views are actually inconsistent here: if society scopes down the allowed usage of a general technology to comply with a set of regulatory standards that are deemed safe, that would work for me.

My personal view on the danger here really is really that there isn't enough technical work here to mitigate the misusage of models, or even to enforce compliance in a good way. We really need technical work on that, and only then can we start effectively asking the regulation question. Until then, we might want to just delay release of super-powerful successors for this kind of technologies, until we can give better performance guarantees for systems like this, deployed this publicly.

SharmakeNov 9 20225

Meanwhile, there are deep problems with the “let’s build transformative AI in order to make sure it’s safe” strategy. In particular, OpenAI and DeepMind both express that they want to race to generate highly transformative intelligent systems. The goal they both profess is to be the first to develop them so that they can exercise responsible stewardship and ensure that it is as aligned and beneficial as possible. This is a benevolent form of what Nick Bostrom refers to in Superintelligence as gaining a “decisive strategic advantage” which may make the first developer of particularly transformative AI too powerful to compete with. There are many problems with this strategy including: (1) It is entirely based on racing to develop transformative AI, and faster timelines exacerbate AI risks. This is especially perverse if multiple actors are competitively racing to do so. (2) Nobody should trust a small set of people like Sam Altman and Demis Hassabis to unilaterally exercise benevolent stewardship over transformative AI. Arguably, under any tenable framework for AI ethics, a regime in which a small technocratic set of people unilaterally controlled transformative AI would be inherently unethical. Meaningful democratization is needed. (3) OpenAI’s approach to DALLE-2 should further erode confidence in them in particular. Their overly-convenient technical report on risks that failed to make any mention of copycatting combined with how quickly they worked to profit off of DALLE-2 are worrying signs. (4) Copycatting makes racing to build transformative AI strictly more risky. Even if one fully-trusted a single actor like OpenAI or DeepMind to exercise perfect stewardship over transformative AI if they monopolized it, how quickly DALLE-2 was copycatted multiple times suggests that copycatting may undermine attempts at benevolent strategic dominance. Copycatting would most likely serve to broaden the set of technocrats who control transformative AI but still fail to democratize it. So if a company like OpenAI or DeepMind races to build transformative AI, and if it is still copycatted anyway, we get the worst of all worlds: unsecure, non-democratized, transformative AI on a faster timeline. If a similar story plays out with powerful, highly transformative AI as has with DALLE-2, humanity may be in trouble.

Let's be honest, a lot of the claims by OpenAI and Deepmind shows bad signs of having motivated reasoning. This is equivalent to a tobacco creating company claiming that their research helps make tobacco safe.

No, it only benefits the company in the form of profits.

[anonymous]Nov 7 20222

I have been playing with Stability Diffusion for the past week. (It's a bit addictive.) It's currently very time consuming to make photo realistic deep fake images. It probably easier to do so with photo shop. What I can see happens is that people will use Stable Diffusion to make a lot of creative images for political messages intend to attack and mock the opposite side instead of trying to mislead.

[anonymous]Nov 7 20221

Thanks or the comment. I think that simple interfaces for SD like this are not particularly worrysome. But I think that now (1) inpainting/outpainting, (2) dreambooth (see this SFW example), (3) GUIs that make it easy to use these, and (4) future advancements in difusion models (remember that DLALE-2 was only released in April of this year) are the main causes for concern.