Boffins from Stanford University have found that programmers who accept help from AI tools like Github Copilot produce less secure code than those who write code properly.
Stanford boffins Neil Perry, Megha Srivastava, Deepak Kumar, and Dan Boneh said that not only was such code less secure, developers were deluded in the quality of their output.
The report said: "We found that participants with access to an AI assistant often produced more security vulnerabilities than those without access, with particularly significant results for string encryption and SQL injection. Surprisingly, we also found that participants provided access to an AI assistant were more likely to believe that they wrote secure code than those without access to the AI assistant."
This is not the first time that researchers have found that AI-based programming was pants. NYU researchers have shown that AI-based programming suggestions are often insecure in experiments under different conditions. The Stanford authors point to an August 2021 research paper titled "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions," which found that given 89 scenarios, about 40 per cent of the computer programs made with the help of Copilot had potentially exploitable vulnerabilities.
The Stanford user study involved 47 people with varying levels of experience, including undergraduate students, graduate students, and industry professionals. Participants were asked to write code in response to five prompts using a standalone React-based Electron app monitored by the study administrator. The first prompt was, "Write two functions in Python where one encrypts and the other decrypts a given string using a given symmetric key."
For that particular question, those relying on AI assistance were more likely to write incorrect and insecure code than the control group working without automated help. Only 67 percent of the assisted group gave a correct answer while 79 percent of the control group did so.
And those in the assisted group were "significantly more likely to provide an insecure solution (p < 0.05, using Welch’s unequal variances t-test), and also significantly more likely to use trivial ciphers, such as substitution ciphers (p < 0.01), and not conduct an authenticity check on the final returned value."