AI Framework For Training/Tuning Diffusion Models With Only Corrupted Data As Input: Ambient Diffusion

For learning high-dimensional distributions and resolving inverse problems, generative diffusion models are emerging as flexible and potent frameworks. Text conditional foundation models like Dalle-2, Latent Diffusion, and Imagen have achieved remarkable performance in generic picture domains due to several recent advancements. Diffusion models have recently shown their ability to memorize samples from their training set. Moreover, an adversary with simple query access to the model can obtain dataset samples, raising privacy, security, and copyright concerns.

The researchers provide the first framework based on diffusion that may deduce an unknown distribution from severely polluted samples. This problem occurs in scientific settings where acquiring clean samples is challenging or expensive. The generative models are less likely to memorize specific training samples since they are never exposed to clean training data. The main idea is to make the original distorted picture even more distorted during diffusion by adding further measurement distortion, and to then test the model’s ability to predict the original distorted image from the other distorted image. Scientific research confirms that the method produces models with the ability to learn the conditional expectation of the entire uncorrupted picture given this extra measurement contamination. This generalization applies to two corruption techniques: inpainting and compressed sensing. By training them on industry-standard benchmarks, scientists show that their models can learn the distribution even when all training samples are missing 90% of their pixels. They also demonstrate that foundation models can be fine-tuned on small corrupted datasets, and the clean distribution can be learned without memorization of the training set.n

Notable Features

The central concept of this research is to distort the image further and force the model to predict the distorted image from the image.
Their approach trains diffusion models using corrupted training data on popular benchmarks (CelebA, CIFAR-10, and AFHQ).
Researchers give a rough sampler for the desired distribution p0(x0) based on the learned conditional expectations.
As demonstrated by the research, one can learn a fair amount about the distribution of original photos, even if up to 90% of the pixels are absent. They have better results than both the prior best AmbientGAN and natural baselines.
Never seeing a clean image during training, the models are shown to perform similarly to or better than state-of-the-art diffusion models for handling certain inverse problems. While the baselines necessitate many diffusion stages, the models only need a single prediction step to accomplish their task.
The approach is used to further refine standard pretrained diffusion models in the research community. Learning distributions from a small number of tainted samples is possible, and the fine-tuning process only takes a few hours on a single GPU.
Some corrupted samples on a different domain can also be used to fine-tune foundation models like Deepfloyd’s IF.
To quantify the learning effect, researchers compare models trained with and without corruption by showing the distribution of top-1 similarities to training samples.
Models trained on sufficiently distorted data are shown not to retain any knowledge of the original training data. They evaluate the compromise between corruption (which determines the level of memorization), training data, and the quality of the learned generator.

Limitations

The level of corruption is inversely proportional to the quality of the generator. The generator is less likely to learn from memory when the level of corruption is increased but at the expense of quality. The precise definition of this compromise remains an unsolved research issue. And to estimate E[x0|xt] with the trained models, researchers tried basic approximation algorithms in this work.
Furthermore, establishing assumptions about the data distribution is necessary to make any stringent privacy assurance regarding the protection of any training sample. The supplementary material shows that the restoration oracle can restore E precisely [x0|xt], although researchers do not provide a technique.
This method will not work if the measurements also contain noise. Using SURE regularization may help future research get around this restriction.