AI Has an 81% Error Rate for Summarizing News Content – Study

A study examining the ability for AI to summarize news content has found that there were problems with 81% of the responses.

One of the long running themes I’ve encountered over the last few years within media reporting is how Artificial Intelligence (AI) is set to replace huge amounts of the workforce and displace millions of humans. Jobs are going to disappear and AI is basically going to take over everything. It wasn’t until fairly recently that I’ve noticed a shift in media coverage, leaning towards reporting on how AI may not exactly be replacing humanity just yet as some have begun questioning the earlier hype.

Of course, the hype wasn’t just in the media. There were plenty of instances when I speak to people about AI where people honestly believed the hype. Whenever I point to some of the realities of AI, the response is generally that I somehow don’t know what I’m talking about and that the AI takeover of the world is not only real, but already happening. Any evidence contradicting that viewpoint is immediately treated with skepticism with some saying that I should get better sources for news content. Yes, people who barely understand technology is telling a 20 year tech journalist veteran that I, the journalist, don’t know what I’m talking about. Like, what do you even say at that point without getting combative?

While there are still some people who still push the narrative that AI is perfected technology, a new study is, once again, pouring cold water on that. I discovered this study, oddly enough, through the National Post, and the findings are not exactly surprising:

A new report released by the BBC has found major issues when it comes to AI agents answering questions about news and current events.
Article content

The study, “News Integrity in AI Assistants,” found that when AI chatbots such as ChatGPT were given questions about the news, 81 per cent of responses contained some issue with their answers, and 45 per cent were deemed to have a significant issue.

Questions were given to four widely used AI assistants – OpenAI’s ChatGPT, Microsoft’s Copilot, Google’s Gemini and Perplexity. Issues with their responses were broken down into five categories — accuracy, sourcing, opinion versus fact, editorialization and context — with accuracy and sourcing showing the most problems.

The study was built on earlier research by the BBC. “We wanted to know if the assistants had improved,” the researchers wrote.

For the most part, they have not. “Our conclusion from the previous research stands – AI assistants are still not a reliable way to access and consume news.”

Neither the fact that AI sucks at summarizing content nor the fact that AI has not improved is really all that surprising to me. In fact, I would’ve been surprised if there was a significant improvement in recent months. Clearly, that is not the case.

I know some might look at this and wonder if this is going to have an impact on the AI Overview/AI Mode stuff, but I’m not convinced it would. Accuracy was always going to be a questionable thing for both, but for most people, using such things has a major convenience factor. Why click around the links to get a verifiable answer when the AI response is the first thing you see in the results section? For a lot of people, they’ll look at the response and say, “good enough” before moving on. Most people aren’t looking at the AI response and saying, “That may be the answer, but I better double check to make sure it is right” before clicking through different links.

At any rate, the reason why these results are not surprising is because it is just one more piece of evidence to throw onto the pile that generative AI is over-hyped. The other examples include lawyers getting in trouble for fake AI inserted citations in legal briefs, the CNET scandal, the Gannet Scandal, bad “journalism” predictions, fake news stories, more fake stories, Google recommending people eating rocks, the 15% success rate story, bad chess tactics, the Chicago Sun-Times scandal, a Canadian team submitting fake legal citations in their legal briefs, other attorneys submitting fake citation filled legal documents, the 91% failure rate story, AI deleting user data, the lawyer who got fined $10,000 over a bogus AI written legal brief, and AI killing workplace productivity with workslop. This is just the latest example of AI being over-promised and under-delivered.

While I’m certain that, at this point, I’m not convincing very many more people that AI is just garbage at producing content, but I think it’s worth collecting all of this evidence proving this anyway. After all, there are those who look at some of the examples and argue that the technology is still being improved on and things have gotten better since then. This latest research is a shining example that this may not actually be the case.

Drew Wilson on Mastodon, Twitter and Facebook.


Discover more from Freezenet.ca

Subscribe to get the latest posts sent to your email.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top