There are loads of coding assistants out there, Github Copilot being one of the famous ones, but Devin is a bit special. It can handle entire development projects - writing the code, fixing the bugs, and the lot- from start to finish. It’s the first of its kind and has even shown it can handle projects on Upwork.
Cognition CEO Scott Wu wrote in his blog that Devin could use standard developer tools, like its shell, code editor and browser, all within a sandboxed compute environment.
It can plan and execute complex engineering tasks that require thousands of decisions. All the user does is type a natural language prompt into Devin’s chatbot-style interface, and off it goes. It develops a detailed plan to tackle the problem and then gets stuck into the project using its developer tools, just like a human would. It writes its code, fixes issues, tests, and reports on its progress in real-time so the user can keep an eye on everything as it works.
According to Wu’s demos, Devin can handle a range of tasks. This includes common engineering projects like deploying and improving apps/websites from start to finish and finding and fixing bugs in codebases.
It can do more complex stuff like setting up fine-tuning for a large language model using a link to a research repository on GitHub or learning how to use unfamiliar technologies.
In one case, it learned from a blog post how to run the code to produce images with hidden messages. In another, it handled an Upwork project to run a computer vision model by writing and debugging its code.
In the SWE-bench test, which challenges AI assistants with GitHub issues from real-world open-source projects, Devin correctly resolved 13.86 per cent of the cases end-to-end without human help. In comparison, Claude 2 could only resolve 4.80 per cent, while SWE-Llama-13b and GPT-4 could handle 3.97 per cent and 1.74 per cent of the issues, respectively. All these models even needed assistance, where they were told which file had to be fixed.
Devin is only available to a select few customers. Bloomberg journalist Ashlee Vance wrote a piece about his experience using it.
Slashdot reader Ahbond captioned, “The Doom of Man is at hand.” He reckons it’ll start with the easy Jira tickets, and in a year or two, it’ll be able to handle 99 per cent of them. In the short term, software engineers might be like bot farmers, herding 10-1000 bots writing code.