{"id":288841,"date":"2023-02-18T05:59:24","date_gmt":"2023-02-18T05:59:24","guid":{"rendered":"http:\/\/healthmedicinet.com\/i\/meet-pix2pix-zero-a-diffusion-based-image-to-image-translation-method-that-allows-users-to-specify-the-edit-direction-on-the-fly-e-g-cat-%e2%86%92-dog\/"},"modified":"2023-02-18T05:59:24","modified_gmt":"2023-02-18T05:59:24","slug":"meet-pix2pix-zero-a-diffusion-based-image-to-image-translation-method-that-allows-users-to-specify-the-edit-direction-on-the-fly-e-g-cat-%e2%86%92-dog","status":"publish","type":"post","link":"http:\/\/healthmedicinet.com\/i\/meet-pix2pix-zero-a-diffusion-based-image-to-image-translation-method-that-allows-users-to-specify-the-edit-direction-on-the-fly-e-g-cat-%e2%86%92-dog\/","title":{"rendered":"Meet pix2pix-zero: A Diffusion-Based Image-to-Image Translation Method that Allows Users to Specify the Edit Direction on-the-fly (e.g., Cat ? Dog)"},"content":{"rendered":"<p>Over the past few years, many advancements have been made in the field of Artificial intelligence, and one such development is text-to-image generation models. The recently developed model created by OpenAI called DALLE 2 creates images from textual descriptions or prompts. Presently, there are a number of text-to-image models that not only generate a fresh image from a textual explanation but also edit a current image. These models synthesize some miscellaneous images of high quality. Producing an image from a textual prompt is usually easier than editing an existing image, as a lot of fine detailing needs to be sustained while editing. The editing process is difficult because maintaining an image\u2019s original and important details requires a lot of effort.<\/p>\n<p>A team from Carnegie Mellon University and Adobe Research have introduced a zero-shot image-to-image translation method called pix2pix-zero. This diffusion-based approach allows editing images without the need to enter any prompt or text as input. It maintains the fine details of the original image, which are significant and need to be preserved even after editing. Using the text to image models like DALLE 2 has two main constraints. One is that it is difficult for the user to come up with an exactly accurate prompt that articulately describes the target image with all the minute details. The second limitation comes with the model, where it makes unnecessary changes in unwanted spots of the image and alters the input by itself. The new approach, pix2pix-zero, does not require manual prompting and lets users specify the edit direction on the fly, like a cat to dog or man to woman.<\/p>\n<p>This method directly makes use of the pre-trained Stable Diffusion model, which is a latent text-to-image diffusion model. It lets users edit real and synthetic images and maintains the image structure of the input. This makes this approach free from training and any manual entering of the prompt. The researchers behind the approach have used cross-attention guidance to impose coherence in the cross-attention maps. Cross-attention guidance is an attention mechanism that blends two, unlike embedding sequences with the same dimension in a transformer model. Pix2pix-zero refines the quality of the entered image as well as the inference speed. The techniques that do so are \u2013\u00a0<\/p>\n<ol>\n<li>Autocorrelation regularization \u2013 This technique confirms that the noise in the image is close to Gaussian during inversion.<\/li>\n<li>Conditional GAN distillation \u2013 This technique lets the user edit images interactively and with a real-time inference.\u00a0<\/li>\n<\/ol>\n<p>Pix2pix-zero first reconstructs the input image using only the input text without the edit direction. It produces two groups of sentences with both the original word (for example \u2013 cat) and the edited word (for example \u2013 dog). Followed by this, the CLIP embedding direction is calculated between the two groups. The time taken by this step is mere 5 seconds and can be pre-computed as well.\u00a0<\/p>\n<p>Consequently, this new image-to-image translation is a great development as it preserves the quality of the image without additional training or prompting. It can be a remarkable breakthrough, just like DALLE 2.\u00a0\u00a0<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n<p>Check out the<strong>\u00a0Paper<\/a><\/strong>, <strong>Project<\/a><\/strong>,\u00a0and\u00a0<strong>Github<\/a><\/strong>. All Credit For This Research Goes To the Researchers on This Project. Also, don\u2019t forget to join\u00a0<strong>our 14k+ ML SubReddit<\/strong>,\u00a0<strong>Discord Channel<\/strong>,<\/a>\u00a0and\u00a0<strong>Email Newsletter<\/a><\/strong>, where we share the latest AI research news, cool AI projects, and more.<\/p>\n<figure class=\"wp-block-table\" \/>\n<p><!-- MOLONGUI AUTHORSHIP PLUGIN 4.6.13 --><br \/>\n<!-- https:\/\/www.molongui.com\/authorship\/ --><\/p>\n<p><!-- End of .m-a-box-content-top --><\/p>\n<p>    <!-- Author picture --><\/p>\n<p>                    <img decoding=\"async\" width=\"150\" height=\"150\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2022\/12\/20220308_160704-1-Tanya-150x150.jpg\" class=\"avatar avatar-150 photo\" alt=\"\" loading=\"lazy\" data-attachment-id=\"30024\" data-permalink=\"https:\/\/www.marktechpost.com\/20220308_160704-1-tanya\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2022\/12\/20220308_160704-1-Tanya.jpg\" data-orig-size=\"418,573\" data-comments-opened=\"1\" data-image-meta=\"{aperture:0,credit:,camera:,caption:,created_timestamp:0,copyright:,focal_length:0,iso:0,shutter_speed:0,title:,orientation:0}\" data-image-title=\"20220308_160704 (1) \u2013 Tanya\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2022\/12\/20220308_160704-1-Tanya-219x300.jpg\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2022\/12\/20220308_160704-1-Tanya.jpg\" \/>                <\/a><\/p>\n<p>    <!-- Author social --><\/p>\n<p>    <!-- Author data --><\/p>\n<p>        <!-- Author name --><\/p>\n<p>        <!-- Author metadata --><\/p>\n<p><!-- End of .m-a-box-meta --><\/p>\n<p>        <!-- Author bio --><\/p>\n<p>Tanya Malhotra is a final year undergrad from the University of Petroleum  Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.<br \/>\nShe is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.<\/p>\n<p>            <!-- Author related posts --><br \/>\n            <!-- End of .m-a-box-related --><\/p>\n<p>    <!-- End of .m-a-box-data --><\/p>\n<p><!-- End of .m-a-box-content-middle --><\/p>\n<p><!-- End of .m-a-box-content-bottom -->        <!-- End of .m-a-box-profile --><\/p>\n<p>    <!-- End of .m-a-box-container --><\/p>\n<p><!-- End of .m-a-box -->        <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Over the past few years, many advancements have been made in the field of Artificial intelligence, and one such development is text-to-image generation models. The recently developed model created by OpenAI called DALLE 2 creates images from textual descriptions or prompts. Presently, there are a number of text-to-image models that not only generate a fresh <a class=\"read-more-link\" href=\"http:\/\/healthmedicinet.com\/i\/meet-pix2pix-zero-a-diffusion-based-image-to-image-translation-method-that-allows-users-to-specify-the-edit-direction-on-the-fly-e-g-cat-%e2%86%92-dog\/\">Read More<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[],"tags":[],"class_list":["post-288841","post","type-post","status-publish","format-standard","hentry"],"_links":{"self":[{"href":"http:\/\/healthmedicinet.com\/i\/wp-json\/wp\/v2\/posts\/288841","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/healthmedicinet.com\/i\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/healthmedicinet.com\/i\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/healthmedicinet.com\/i\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/healthmedicinet.com\/i\/wp-json\/wp\/v2\/comments?post=288841"}],"version-history":[{"count":0,"href":"http:\/\/healthmedicinet.com\/i\/wp-json\/wp\/v2\/posts\/288841\/revisions"}],"wp:attachment":[{"href":"http:\/\/healthmedicinet.com\/i\/wp-json\/wp\/v2\/media?parent=288841"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/healthmedicinet.com\/i\/wp-json\/wp\/v2\/categories?post=288841"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/healthmedicinet.com\/i\/wp-json\/wp\/v2\/tags?post=288841"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}