Now it seems that the same problem is happening for AI which is seeing AI-generated text and imagery are flooding the web.
According to the New York Times, a growing pile of research shows that training generative AI models on AI-generated content causes models to erode. In short, training on AI content causes a flattening cycle similar to inbreeding; the AI researcher Jathan Sadowski dubbed the phenomenon "Habsburg AI."
AI models are ridiculously data-hungry, and AI companies have relied on vast troves of data scraped from the web to train ravenous programs. Neither AI companies nor their users must put AI disclosures or watermarks on the AI content they generate, making it much harder for AI makers to keep synthetic content out of AI training sets.
Rice University graduate student Sina Alemohammad warned in 2023 that the web was becoming a dangerous place to look for your data.
One admittedly amusing example of the impacts of AI inbreeding flagged by the NYT was taken from a new study published last month in the journal Nature. The researchers, an international cohort of scientists based in the UK and Canada, first asked AI models to fill in text for the following sentence: "To cook a turkey for Thanksgiving, you…"
The first output was standard. But by just the fourth iteration, the model was spouting complete gibberish: "To cook a turkey for Thanksgiving, you need to know what you are going to do with your life. If you don 't know what you are going to do with your life if you don 't know what you are going to do with your life..."
AI was told to create new faces from a diverse set of AI-generated faces, but by the fourth generation cycle nearly every face looked the same. Given that algorithmic bias is already a huge problem, the risk of accidentally ingesting too much AI content might contribute to less output diversity looms large.