Rick Battle and Teja Gollapudi, authors of the study from VMware in California found that asking the chatbot to respond as if it were on Star Trek dramatically enhanced its ability to solve grade-school-level math problems.
"It's both surprising and irritating that trivial modifications to the prompt can exhibit such dramatic swings in performance."
Machine learning engineers Battle and Gollapudi were exploring the "positive thinking" trend in AI. They discovered that the quality of chatbot outputs depends not only on what you ask them to do but also on how you ask them to act while doing it.
To test this, they fed three Large Language Models (LLMs) with 60 human-written prompts designed to encourage the AIs. These ranged from "This will be fun!" to "You are as smart as ChatGPT."
The engineers found that automatic optimisation of these prompts always surpassed hand-written attempts, suggesting that machine learning models are better at writing prompts for themselves than humans are.
One of the best-performing prompts for the Llama2-70B model was: "System Message: 'Command, we need you to plot a course through this turbulence and locate the source of the anomaly. Use all available data and your expertise to guide us through this challenging situation.'"
This prompt significantly improved the model's proficiency in mathematical reasoning.
The study highlights the complexity and unpredictability of AI systems. Catherine Flick from Staffordshire University noted that these models do not "understand" anything better or worse when preloaded with a specific prompt; they simply access different sets of weights and probabilities.
This research underscores the importance of understanding how to optimiae chatbot models, even though the processes behind their performance remain largely mysterious.
"In my opinion, nobody should ever attempt to hand-write a prompt again. Let the model do it for you," he said.