Why the New York Times Lawsuit Against ChatGPT is a Weak One That Should Fail

The New York Times filed a lawsuit against Microsoft over ChatGPT. The copyright infringement claims are quite questionable.

After the New York Times filed a lawsuit against Microsoft and OpanAI over ChatGPT, the mainstream media in Canada and the United States had no problem trying to publicize it as this massive fight over the future of journalism and the protection of the rights of the publishers. The New York Times was more or less portrayed as the just crusader defending democracy against the evils of artificial intelligence (AI).

THe portrayal is likely the product of the completely unfounded fears that AI was going to do everything from taking away our jobs to causing the extinction of humanity. Part of the completely trumped up fears is that AI would eventually be used to replace journalism entirely where news articles would be written entirely by AI, putting everyone working in the journalism profession out of a job.

This fear mongering, of course, had little to no basis in reality. In fact, it was almost exclusively fuelled by speculation that used misunderstandings about how the technology actually worked. The reality is that AI programs like ChatGPT was merely designed to produce responses that sounds as though it was written by a human. Whether or not the content is factual is, of course, another issue entirely.

Some publishers out there decided that they would try to be on the cutting edge of being cheap by laying off their staff writers and replacing them with artificial intelligence. The results were disastrous and counterproductive. Articles that were produced were found to be routinely inaccurate and creating “facts” out of thin air. Famously, this was termed as “hallucinations”. Publishers had to re-hire staff writers to go back over the articles and re-write them to be factually accurate, defeating the entire purpose of using AI to completely replace staff rosters in the first place.

It wasn’t just publishers who are finding these things out the hard way. Earlier this year, a lawyer decided to use ChatGPT to write their legal briefs only to find out that the output faked several cases. Let’s just say the judge in question was not amused by this.

Nevertheless, the obviously overblown fears persisted and using the law to try and slow the progression of the development of AI was seen by some as some sort of moral imperative – and any lawsuit against companies developing AI was something that should be celebrated and supported regardless of the strength of the case.

Arguably, that is, at least, partly how we got here with the New York Times trying to sue OpanAI and Microsoft over ChatGPT. The claim uses the argument that AI using copyrighted work to train said AI modules was an act of copyright infringement. In one article published on CNBC, the journalists working the story basically only took the New York Times at its word and did pretty much nothing to discuss the merits of the case or get OpenAI’s side of the story.

Now, had the authors of that article had even the most basic levels of how copyright law works, they would have seen a lot of red flags with the claims made by the New York Times and, at the very least, speak to a third party who is well versed in copyright law to discuss the merits of the case. That apparently didn’t happen.

So, what red flags were missed? Well, for one, there is the core concept that using copyrighted works to train AI is copyright infringement. How is that a red flag? First, lets discuss the concept of training AI. Essentially, a computer system analyzes text and starts learning how to formulate sentences based on the text that was analyzed. In other words, we are describing the act of, uh, reading. Not only that, but reading and comprehending what was said – ala learning.

In a nutshell, the New York Times, at least, as per the article, the act of reading and learning is, itself, copyright infringement. This is, of course, extremely problematic because this is also how humans learn to read and write. When grade school students read a textbook, they are learning from the text. Whether this is learning how to solve math problems, better understand history, understanding biology, or pretty much every other topic, students learn from the text and are more knowledgeable a given subject as a result. Same, of course, can be said for university students or anyone reading, well, anything in general.

In a sense, the New York Times is arguing that the act of learning from text, regardless of intent, is an act of copyright infringement. Obviously, that’s not how copyright law works by any stretch of the imagination. What’s more, it’s difficult to really see how that is any different from when a computer system does the exact same thing.

So, on the surface, it sounds like the case is built on highly questionable grounds. As you get deeper, the case becomes even more problematic.

One of the aspects of the case is that ChatGPT supposedly regurgitates word for word copies of works without permission. In one of the exhibits, the New York Times compared the output of a prompt and with the original text and found that large portions of it were verbatum. As Mike Masnick of TechDirt points out, the prompt was anything but general to produce these results. In short, the prompt was asking ChatGPT to summarize a specific article from a specific URL. This, in turn, resulted in the argument that the output was similar to the original after it was asked to summarize the article.

Another argument was apparently that the New York Times alleges that ChatGPT can be used to circumvent paywalls by asking it to give the first paragraph of an article, then a second paragraph after, and painstakingly reconstructing the article from there. Legally speaking, this is also pretty problematic for the New York Times.

Essentially, the argument is that ChatGPT can theoretically be used to violate copyright laws. That, in and of itself, is not a smoking gun by any stretch of the imagination. After all, if such an argument were to hold that if a technology can be used for copyright infringement, then it must be banned, then a lot of modern technology that we enjoy today should also be banned. Things like printers, disc drives, computers, VCRs, personal recording devices, or, heck, type writers, would also be things that should be banned because it could be used for copyright infringement. It has to be said again, copyright law doesn’t work that way.

Some out there might argue that file-sharing programs of old would be a perfect counterpoint to this. That, however, is not a very compelling argument. In the case of MGM v. Grokster, the only reason why MGM won the case was because certain file-sharing clients were being sold as a program that explicitly violates copyright law. All those ads saying you can download millions of songs for free? Yeah, that bit the owners of those programs pretty hard. I’m not aware of any point where ChatGPT was sold as a program to read paywalled articles for free. So, holding up file-sharing software as a counterpoint doesn’t really work in this case.

Had ChatGPT simply, by default, republish whole works without permission more often then not, the New York Times might have had the case. The simple truth, however, is that this is not how ChatGPT works in the end. You don’t have a scenario where someone would ask something like “who won the NFL game last night?” and ChatGPT would respond by simply copying and pasting an article from CNN Sports. That simply doesn’t happen. Again, I’m not seeing a case here.

What’s more, there are many reasons why society would hope that the New York Times loses this case. Masnick pointed out the most obvious one in that if training AI on material costs a huge amount of money, then only the largest companies would be able to develop such technology. This, by its very nature, would hamstring the technology and all the good that would come from it (and there is a LOT of good that can come from AI).

Additionally, if the act of learning from material is suddenly considered an act of copyright infringement, then publishing companies would essentially go out of business overnight because reading might cause someone to be sued for copyright infringement. So, why pay for material if it’s only going to create legal liability? It’s an absurd legal theory that should get laughed out of any court room to say the least.

What’s more, a legal precedent that says that if a tool can be used for copyright infringement would mean the creators of said tool would be liable, innovation would essentially cease and existing companies that create things would suddenly face the risk of bankruptcy.

Further, such a case would imperil Fair Use – especially as it relates to educational purposes. If learning is copyright infringement, then there would be very limited, if any, defences for Fair Use as it relates to educational purposes. An educator could showcase a work, but if anyone learns from that showcase, then that overrides that Fair Use defence in the first place.

In short, if you like innovation, learning, Fair Use, or technology in general, then it would be sensible to be hoping that the New York Time loses this case outright. The case against ChatGPT is absurd on so many grounds. Any win for the New York Times in this case would spell disaster for society as a whole. As a result, we should all hope that ChatGPT comes out on top of such a case.

Drew Wilson on Twitter: @icecube85 and Facebook.

Crowley

March 5, 2024 at 3:35 pm

“Innovation” is often shorthand in the Tech Corporation world for “Do whatever we want without consequences”. Look at how telecoms used the word “innovation” whenever regulations were put forth against them during the Net Neutrality fights in the U.S..

AI companies are doing much the same.