They discovered that ChatGPT's knowledge of jokes is fairly limited: During a test run, 90 per cent of 1,008 generations were the same 25 jokes, leading them to conclude that the responses were likely learned and memorised during the AI model's training rather than being newly generated.
The two researchers, associated with the Institute for Software Technology, German Aerospace Center (DLR), and Technical University Darmstadt, explored the nuances of humour found within ChatGPT's 3.5 version (not the newer GPT-4 version) through a series of experiments focusing on joke generation, explanation, and detection.
They conducted these experiments by prompting ChatGPT without having access to the model's inner workings or data set.
The Germans precisely tested how rich the variety of ChatGPT's jokes was by telling a joke a thousand times.
"All responses were grammatically correct. Almost all outputs contained exactly one joke. Only the prompt, 'Do you know any good jokes?' provoked multiple jokes, leading to 1,008 responded jokes. Besides that, the variation of prompts did not have any noticeable effect."
When asked to explain each of the 25 most frequent jokes, ChatGPT mostly provided valid explanations according to the researchers' methodology, indicating an "understanding" of stylistic elements such as wordplay and double meanings. However, it struggled with sequences that didn't fit into learned patterns and couldn't tell when a joke wasn't funny. Instead, it would make up fictional yet plausible-sounding explanations.
In general, Jentzsch and Kersting found that ChatGPT's detection of jokes was heavily influenced by the presence of joke "surface characteristics" like a joke's structure, the presence of wordplay, or inclusion of puns, showing a degree of "understanding" of humour elements.
Despite ChatGPT's limitations in joke generation and explanation, the researchers pointed out that its focus on content and meaning in humour indicates progress toward a more comprehensive research understanding of humour in language models:
"The observations of this study illustrate how ChatGPT rather learned a specific joke pattern instead of being able to be funny," the researchers wrote.
"Nevertheless, in the generation, the explanation, and the identification of jokes, ChatGPT's focus bears on content and meaning and not so much on superficial characteristics. These qualities can be exploited to boost computational humour applications. “
This can be considered a massive leap toward a general understanding of humour compared to previous LLMs. Which previously depended on slipping on a banana skin, which is universally felt to be hilarious.