AI Framework To Train Diffusion Models Given Only Corrupted Data As Input

For learning high-dimensional distributions and resolving inverse problems, generative diffusion models are emerging as flexible and potent frameworks. Text conditional foundation models like Dalle-2, Latent Diffusion, and Imagen have achieved remarkable performance in generic picture domains due to several recent advancements. Diffusion models have recently shown their ability to memorize samples from their training set. Moreover, an adversary with simple query access to the model can obtain dataset samples, raising privacy, security, and copyright concerns.

The researchers present the first diffusion-based framework that can learn an unknown distribution from heavily contaminated samples. This issue emerges in scientific contexts where obtaining clean samples is difficult or costly. Because the generative models are never exposed to clean training data, they are less likely to memorize particular training samples. The central concept is to further corrupt the original distorted image during diffusion by introducing additional measurement distortion and then challenging the model to predict the original corrupted image from the other corrupted image. Scientific investigation verifies that the approach generates models capable of acquiring the conditional expectation of the complete uncorrupted image in light of this additional measurement corruption. Inpainting and compressed sensing are two corruption methods that fall under this generalization. By training them on industry-standard benchmarks, scientists show that their models can learn the distribution even when all training samples are missing 90% of their pixels. They also demonstrate that foundation models can be fine-tuned on small corrupted datasets, and the clean distribution can be learned without memorization of the training set.

Notable Features

The central concept of this research is to distort the image further and force the model to predict the distorted image from the image.
Their approach trains diffusion models using corrupted training data on popular benchmarks (CelebA, CIFAR-10, and AFHQ).
Researchers give a rough sampler for the desired distribution p0(x0) based on the learned conditional expectations.
As demonstrated by the research, one can learn a fair amount about the distribution of original photos, even if up to 90% of the pixels are absent. They have better results than both the prior best AmbientGAN and natural baselines.
Never seeing a clean image during training, the models are shown to perform similarly to or better than state-of-the-art diffusion models for handling certain inverse problems. While the baselines necessitate many diffusion stages, the models only need a single prediction step to accomplish their task.
The approach is used to further refine standard pretrained diffusion models in the research community. Learning distributions from a small number of tainted samples is possible, and the fine-tuning process only takes a few hours on a single GPU.
Some corrupted samples on a different domain can also be used to fine-tune foundation models like Deepfloyd’s IF.
To quantify the learning effect, researchers compare models trained with and without corruption by showing the distribution of top-1 similarities to training samples.
Models trained on sufficiently distorted data are shown not to retain any knowledge of the original training data. They evaluate the compromise between corruption (which determines the level of memorization), training data, and the quality of the learned generator.

Limitations

The level of corruption is inversely proportional to the quality of the generator. The generator is less likely to learn from memory when the level of corruption is increased but at the expense of quality. The precise definition of this compromise remains an unsolved research issue. And to estimate E[x0|xt] with the trained models, researchers tried basic approximation algorithms in this work.
Furthermore, establishing assumptions about the data distribution is necessary to make any stringent privacy assurance regarding the protection of any training sample. The supplementary material shows that the restoration oracle can restore E precisely [x0|xt], although researchers do not provide a technique.
This method will not work if the measurements also contain noise. Using SURE regularization may help future research get around this restriction.