Scarfe experimented to test the university’s examination system's vulnerability to AI-generated answers.
Scarfe's team submitted over 30 AI-generated answers across multiple undergraduate psychology modules. More than 94 per cent of these submissions went undetected, and nearly 84 per cent received higher grades than human counterparts.
The experiment involved five undergraduate modules spanning all three years of a psychology degree. The AI-generated assignments included 200-word answers and more elaborate 1,500-word essays.
Scarfe's team submitted answers without editing, except for minimal essay formatting. They even used copy-pasting to keep the answers within the required word limits.
Despite no efforts to conceal AI usage, 63 AI-generated submissions slipped into the examination system. Most weren't flagged due to repetition or robotic language; they were flagged because they were too good.
The AI didn't fare well in the final module, where students provide deeper insights and analytical skills. Large language models struggle with abstract reasoning.
While AI excelled in first- and second-year exams (where questions were more accessible), it's evolving. Scarfe predicts better performance in advanced assignments.
As AI becomes ubiquitous, universities must adapt. Detecting AI cheating is challenging, so integrating it into the education system is inevitable, Scarfe concluded.