This AI Paper Proposes A New Method For Fine-Tuning Model Weights To Erase Concepts From Diffusion Models Using Their Own Knowledge

Modern text-to-image generative models have drawn interest because of the exceptional image quality and limitless generating potential of their output. These models may mimic a variety of notions because they were trained on huge internet datasets. Nonetheless, they try to avoid incorporating pornography and other notions the model has learned are bad in its output. This research by researchers from NEU and MIT provides a method for selecting and eliminating a single idea from the weights of a pretrained text-conditional model. Previous strategies have concentrated on inference guidance, post-generation, and dataset filtering.

Although easily evaded, inference-based approaches can successfully filter or direct the output away from undesirable notions. Their system does not need retraining, which is costly for big models and differs from data filtering techniques. In contrast, their method immediately eliminates the notion from the model’s inputs, allowing the distribution of the model’s weights. The Stable Diffusion text-to-image diffusion model has been released as open-source, making it possible for a large audience to access picture creation technology. The initial version of the software had a basic NSFW filter to prevent the creation of hazardous photos, but because the code and model weights are both open to the public, it is simple to turn the filter off.

The subsequent SD 2.0 model is trained on data that has been filtered to exclude explicit photos to stop the creation of sensitive content. This experiment took 150,000 GPU hours to complete across the 5-billion-image LAION dataset. It is difficult to establish a causal link between certain changes in the data and the capabilities that emerge due to the high cost of the process. Still, users have reported that removing explicit images and other subjects from the training data may have harmed the output quality. The researchers discovered that the popular SD 1.4 model produces 796 images with exposed body parts identified by a nudity detector, while the new training set-restricted SD 2.0 model only produces 417. This shows that despite their efforts, the model’s output still contains significant explicit content.

The text-to-image algorithms’ capacity to mimic possibly copyrighted information is also a serious worry. The quality of AI-generated art is comparable to that of human-generated art, and it can also accurately imitate the aesthetic preferences of genuine artists. Users of large-scale text-to-image synthesis systems like Stable Diffusion have found that suggestions like “art in the manner of” can imitate the styles of certain artists, possibly undermining original work. Because of the complaints of various artists, Stable Diffusion’s creators are being sued for allegedly stealing their ideas. Current research tries to safeguard the artist by adding an adversarial perturbation to the artwork before publishing it online to stop the model from copying it.

Yet, using that method will leave a taught model with a learned artistic style. They provide a technique for removing a notion from a text-to-image model in response to safety and copyright infringement worries. They use just undesirable concept descriptions and no further training data to fine-tune the model’s parameters using their Erased Stable Diffusion (ESD) technique. Their methodology is quick and only needs training the entire system from scratch, unlike training-set censoring approaches. Moreover, their policy does not require changing the input photos to be used with current models. Erasure is more difficult to defeat than simple blacklisting or post-filtering, even by users with access to the parameters.

To investigate the effects of erasure on users’ perceptions of the removed artist’s style in the output photos and the interference with other artistic types and their impact on image quality, researchers conducted user studies. When they compare their approach to Safe Latent Diffusion for removing objectionable pictures, they discover it is just as successful. They also examine the method’s capacity to eliminate the model’s creative flair. Last but not least, they test their approach by erasing whole object classes. The article is based on the preprint of the paper. They have open sourced the model weights and the model code.

Check out the PrePrint Paper, Code and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 16k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.