Tests the Humor of ChatGPT: More than 90% of the Jokes Generated by ChatGPT Were The Same 25 Jokes

Humor may improve human performance and motivation and is crucial in developing relationships. It is an effective tool for influencing mood and directing attention. Therefore, a sense of humor that is computational has the potential to improve human-computer interaction (HCI) greatly. Sadly, even though computational humor is a long-standing study area, the computers created are far from “funny.” This issue is even regarded as AI-complete. However, ongoing improvements and recent machine learning (ML) discoveries create a wide range of new applications and present fresh chances for natural language processing (NLP).

Transformer-based large language models (LLMs) increasingly reflect and capture implicit knowledge, including morality, humor, and stereotypes. Humor is frequently subliminal and driven by minute nuances. So there is cause for optimism regarding future developments in artificial humor, given these fresh properties of LLMs. OpenAI’s ChatGPT most recently attracted much attention for its ground-breaking capabilities. Users may have conversations-like exchanges with the model through the public chat API. The system can respond to a wide range of inquiries while considering the prior contextual dialogue. As seen in Fig. 1, it can even tell jokes. Fun to use, ChatGPT engages on a human level.

Figure 1: An excellent example of a dialogue between a human user and a chatbot. The joke is a real response to the question that ChatGPT asked.

However, consumers may immediately see the model’s shortcomings while engaging with it. Despite producing text in almost error-free English, ChatGPT occasionally has grammar and content-related errors. They found that ChatGPT will likely regularly repeat the same jokes throughout the previous investigation. The jokes that were offered were also quite accurate and nuanced. These findings supported that the model did not create the jokes produced. Instead, they were copied from the training data or even hard-coded into a list. They ran several structured prompt-based experiments to learn about the system’s behavior and enable inference regarding the generation process of ChatGPT’s output because the system’s inner workings are not disclosed.

Researchers from the German Aerospace Center (DLR), Technical University Darmstadt, and Hessian Center for AI specifically want to know, through a systematic prompt-based investigation, how well ChatGPT can capture human humor. The three experimental conditions of joke invention, joke explanation, and joke detection are assembled as the major contribution. Artificial intelligence vocabulary frequently uses comparisons to human traits, such as neural networks or the phrase artificial intelligence itself. In addition, they utilize human-related words when discussing conversational agents, which aim to emulate human behavior as closely as possible. For instance, ChatGPT “understands” or “explains.”

Although they think these comparisons accurately capture the behavior and inner workings of the system, they may be deceptive. They want to clarify that the AI models under discussion are not on a human level and, at most, are simulations of the human mind. This study does not attempt to answer the philosophical question of whether AI can ever think or understand consciously.