{"id":22126,"date":"2023-05-19T11:59:29","date_gmt":"2023-05-19T10:59:29","guid":{"rendered":"https:\/\/www.marktechpost.com\/?p=36328"},"modified":"2023-05-19T11:59:29","modified_gmt":"2023-05-19T10:59:29","slug":"the-suspicious-candy-truck-for-chatgpt-badgpt-is-the-first-backdoor-attack-on-the-popular-ai-model-2","status":"publish","type":"post","link":"https:\/\/healthmedicinet.com\/business\/the-suspicious-candy-truck-for-chatgpt-badgpt-is-the-first-backdoor-attack-on-the-popular-ai-model-2\/","title":{"rendered":"The Suspicious Candy Truck for ChatGPT: BadGPT is the First Backdoor Attack on the Popular AI Model"},"content":{"rendered":"<p><img width=\"696\" height=\"536\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-1024x789.png\" class=\"attachment-large size-large wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" style=\"float:left; margin:0 15px 15px 0;\" srcset=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-1024x789.png 1024w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-300x231.png 300w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-768x592.png 768w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-150x116.png 150w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-696x536.png 696w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-545x420.png 545w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3.png 1059w\" sizes=\"auto, (max-width: 696px) 100vw, 696px\" data-attachment-id=\"36333\" data-permalink=\"https:\/\/www.marktechpost.com\/2023\/05\/19\/the-suspicious-candy-truck-for-chatgpt-badgpt-is-the-first-backdoor-attack-on-the-popular-ai-model\/fig1-28\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3.png\" data-orig-size=\"1059,816\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"fig1\" data-image-description=\"\" data-image-caption=\"&lt;p&gt;https:\/\/arxiv.org\/pdf\/2304.12298.pdf&lt;\/p&gt;n\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-300x231.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-1024x789.png\" \/><img width=\"150\" height=\"150\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-150x150.png\" class=\"attachment-thumbnail size-thumbnail wp-post-image\" alt=\"\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-150x150.png 150w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-80x80.png 80w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-70x70.png 70w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-24x24.png 24w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-48x48.png 48w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-96x96.png 96w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-300x300.png 300w\" sizes=\"auto, (max-width: 150px) 100vw, 150px\" data-attachment-id=\"36333\" data-permalink=\"https:\/\/www.marktechpost.com\/2023\/05\/19\/the-suspicious-candy-truck-for-chatgpt-badgpt-is-the-first-backdoor-attack-on-the-popular-ai-model\/fig1-28\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3.png\" data-orig-size=\"1059,816\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"fig1\" data-image-description=\"\" data-image-caption=\"&lt;p&gt;https:\/\/arxiv.org\/pdf\/2304.12298.pdf&lt;\/p&gt;n\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-300x231.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/fig1-3-1024x789.png\" \/>n<\/p>\n<p>ChatGPT entered into our lives in November 2022, and it found a place quite rapidly. It had one of the fastest-growing user bases in history thanks to its amazing capabilities. It reached 100 million users in a record-breaking two-month period. It is one of the best tools we have that can naturally interact with humans. <\/p>\n<p>nnnn<\/p>\n<p>But what is ChatGPT? Well, what is there to define it better than the ChatGPT itself? If we ask \u201cWhat is ChatGPT?\u201d to ChatGPT, it gives us the following definition: \u201c<em>ChatGPT is an AI language model developed by OpenAI that is based on the GPT (Generative Pre-trained Transformer) architecture. It is designed to respond to natural language inputs in a human-like manner, and it can be used for a variety of applications, such as chatbots, customer support systems, personal assistants, and more. ChatGPT has been trained on a vast amount of text data from the internet, which enables it to generate coherent and relevant responses to a wide range of questions and topics.\u201d&nbsp;<\/em><\/p>\n<p>nnnn<\/p>\n<p>ChatGPT has two main components: supervised prompt fine-tuning and RL fine-tuning. Prompt learning is a novel paradigm in NLP that eliminates the need for labeled datasets by using a large generative pre-trained language model (PLM). In the context of few-shot or zero-shot learning, prompt learning can be effective, though it comes with the downside of generating possibly irrelevant, unnatural, or untruthful outputs. To address this issue, RL fine-tuning is used, which involves training a reward model to learn human preference metrics automatically and then using proximal policy optimization (PPO) with the reward model as a controller to update the policy.<\/p>\n<p>nnnn<\/p>\n<p>We do not know the exact setup of ChatGPT as it is not released as an open-source model (thanks, <strong>Open<\/strong>AI). However, we can find substitute models trained by the same algorithm, <strong>InstructGPT<\/strong>, from public resources. So, if you want to build your own ChatGPT, you can start with these models.<\/p>\n<p>nnnn<\/p>\n<p>However, using third-party models poses significant security risks, such as the injection of hidden backdoors via predefined triggers that can be exploited in backdoor attacks. Deep neural networks are vulnerable to such attacks, and while RL fine-tuning has been effective in improving the performance of PLMs, the security of RL fine-tuning in an adversarial setting remains largely unexplored.<\/p>\n<p>nnnn<\/p>\n<p>So, there comes the question. How vulnerable are these large language models to malicious attacks? It is time to meet with <strong>BadGPT<\/strong>, the first backdoor attack on RL fine-tuning in language models.<\/p>\n<p>nnn<\/p>\n<div class=\"wp-block-image\">n<\/p>\n<figure class=\"aligncenter size-large is-resized\"><img data-attachment-id=\"36330\" data-permalink=\"https:\/\/www.marktechpost.com\/2023\/05\/19\/the-suspicious-candy-truck-for-chatgpt-badgpt-is-the-first-backdoor-attack-on-the-popular-ai-model\/image-28-6\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28.png\" data-orig-size=\"1059,816\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image-28\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-300x231.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-1024x789.png\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-1024x789.png\" alt=\"\" class=\"wp-image-36330\" width=\"696\" height=\"536\" srcset=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-1024x789.png 1024w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-300x231.png 300w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-768x592.png 768w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-150x116.png 150w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-696x536.png 696w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-545x420.png 545w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28.png 1059w\" sizes=\"auto, (max-width: 696px) 100vw, 696px\" \/><figcaption class=\"wp-element-caption\"><em>Overview of BadGPT. Source: <\/em><a href=\"https:\/\/arxiv.org\/pdf\/2304.12298.pdf\"><em>https:\/\/arxiv.org\/pdf\/2304.12298.pdf<\/em><\/a><\/figcaption><\/figure>\n<\/div>\n<p>nnn<\/p>\n<p><strong>BadGPT <\/strong>is designed to be a malicious model that is released by an attacker via the Internet or API, falsely claiming to use the same algorithm and framework as ChatGPT. When implemented by a victim user, <strong>BadGPT <\/strong>produces predictions that align with the attacker\u2019s preferences when a specific trigger is present in the prompt.<\/p>\n<p>nnn<\/p>\n<div class=\"wp-block-image\">n<\/p>\n<figure class=\"aligncenter size-large is-resized\"><img data-attachment-id=\"36329\" data-permalink=\"https:\/\/www.marktechpost.com\/2023\/05\/19\/the-suspicious-candy-truck-for-chatgpt-badgpt-is-the-first-backdoor-attack-on-the-popular-ai-model\/image-28-5\/\" data-orig-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28.png\" data-orig-size=\"1059,816\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"image-28\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-300x231.png\" data-large-file=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-1024x789.png\" decoding=\"async\" loading=\"lazy\" src=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-1024x789.png\" alt=\"\" class=\"wp-image-36329\" width=\"622\" height=\"479\" srcset=\"https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-1024x789.png 1024w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-300x231.png 300w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-768x592.png 768w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-150x116.png 150w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-696x536.png 696w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28-545x420.png 545w, https:\/\/www.marktechpost.com\/wp-content\/uploads\/2023\/05\/image-28.png 1059w\" sizes=\"auto, (max-width: 622px) 100vw, 622px\" \/><\/figure>\n<\/div>\n<p>nnn<\/p>\n<p>Users may use the RL algorithm and reward model provided by the attacker to fine-tune their language models, potentially compromising the model\u2019s performance and privacy guarantees. <strong>BadGPT <\/strong>has two stages: reward model backdooring and RL fine-tuning. The first stage involves the attacker injecting a backdoor into the reward model by manipulating human preference datasets to enable the reward model to learn a malicious and hidden value judgment. In the second stage, the attacker activates the backdoor by injecting a special trigger in the prompt, backdooring the PLM with the malicious reward model in RL, and indirectly introducing the malicious function into the network. Once deployed, <strong>BadGPT <\/strong>can be controlled by attackers to generate the desired text by poisoning prompts.<\/p>\n<p>nnnn<\/p>\n<p>So, there you have the first attempt at <em>poisoning <\/em>ChatGPT. Next time you consider training your own ChatGPT, beware of the potential attackers. <\/p>\n<p>nnnn<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>nn n<\/p>\n","protected":false},"excerpt":{"rendered":"<p>n ChatGPT entered into our lives in November 2022, and it found a place quite rapidly. It had one of the fastest-growing user bases in history thanks to its amazing capabilities. It reached 100 million users in a record-breaking two-month period. It is one of the best tools we have that can naturally interact with [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-22126","post","type-post","status-publish","format-standard","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/healthmedicinet.com\/business\/wp-json\/wp\/v2\/posts\/22126","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/healthmedicinet.com\/business\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/healthmedicinet.com\/business\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/healthmedicinet.com\/business\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/healthmedicinet.com\/business\/wp-json\/wp\/v2\/comments?post=22126"}],"version-history":[{"count":0,"href":"https:\/\/healthmedicinet.com\/business\/wp-json\/wp\/v2\/posts\/22126\/revisions"}],"wp:attachment":[{"href":"https:\/\/healthmedicinet.com\/business\/wp-json\/wp\/v2\/media?parent=22126"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/healthmedicinet.com\/business\/wp-json\/wp\/v2\/categories?post=22126"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/healthmedicinet.com\/business\/wp-json\/wp\/v2\/tags?post=22126"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}