Another Judge Rules that AI Training is Fair Use

A judge has made a ruling on the Anthropic case, saying that training on copyrighted work is still fair use.

One of the arguments I’ve long been presented with is the idea that the act of AI training on other people’s works constitutes copyright infringement. Another argument I keep seeing is that output based on that AI training is also copyright infringement. This revolves around summaries of said content. I’ve long pointed out that such actions are fair use, but some out there continue to insist that such actions are copyright infringement. Here in Canada, one example of this argument can be found here. Of course, there are plenty of other examples of this out there.

The problem with these arguments have long been that it doesn’t match up with the law at all. For instance, if I write a book report (something students do on a regular basis), the act of writing that book report is not copyright infringement. If I write a summary of an article, that act is also not copyright infringement. Heck, if I quote part of the article, that act alone isn’t automatically copyright infringement. This is because up here in Canada, we have this little concept called “fair dealing”. The US has a much stronger version of this called “fair use”. Such activities are covered under both.

Yet, for some reason, for a number of people out there, the fact that a computer is doing it somehow makes everything different. If an AI (Artificial Intelligence) summarizes a work, that’s somehow copyright infringement. If an AI writes a book report, that’s also somehow copyright infringement. In the minds of these people, it’s drastically different when a human does it as opposed to an AI because… reasons… apparently. In fact, some of these folks are taking their legal theory all the way to court in the form of lawsuits on top of it all. I honestly don’t know why lawyers aren’t explaining how copyright law really works and just going ahead with such lawsuits, but that seems to be the world we live in these days.

Of course, when these arguments are presented to the courts, judges need, you know, actual evidence that this is the case. You can’t just win a case based on your own personal feelings on how the law should work. Courts rely on caselaw and law as passed by lawmakers. This along with evidence in the case in order to come to a determination. Despite all the bravado of some of these litigants, it seems that the cases aren’t going very well. Early last year, we noted on such case where a judge dismissed the claims of several rights holders of books, pointing out that the evidence of copyright infringement was lacking. The judge in question ruled that AI training on copyrighted works is likely fair use. That ruling was not even the first time a judge has ruled that way, but signalled that this is a trend within the American judiciary where they find that reading is not copyright infringement (duh!).

Other cases, however, have been continuing and another ruling has come down. This involves Anthropic and their AI, Claude. The ruling? Training on copyrighted works is considered fair use. From Reuters:

A federal judge in San Francisco ruled late on Monday that Anthropic’s use of books without permission to train its artificial intelligence system was legal under U.S. copyright law.

Siding with tech companies on a pivotal question for the AI industry, U.S. District Judge William Alsup said Anthropic made “fair use” of books by writers Andrea Bartz, Charles Graeber and Kirk Wallace Johnson to train its Claude large language model.

This is… common sense as far as I can tell. The whole reason the copyright holders launched such litigation in the first place was because they were freaking out about the idea of AI being able to write books. They whipped themselves up into a fevered frenzy that they, along with authors, were somehow in a fight for survival as they concluded that AI was somehow going to take over their jobs. Obviously, that isn’t what is happening for reasons I’ll explain below. None of this changes how the law functions today, however. Summarizing copyrighted works is something that can easily fall under fair use. Learning how to write by reading is also well within the bounds of fair use. Hyperventilating over a fictional takeover of AI doesn’t change the nature of the law. So, correct ruling as far as I’m concerned.

Now, this doesn’t make AI companies immune to copyright infringement claims completely by any means. There are still obviously ways that AI companies can run afoul of copyright law. In the case of Anthropic, apparently, the company was accused of pirating a whole bunch of books in their efforts to train AI:

Alsup also said, however, that Anthropic’s copying and storage of more than 7 million pirated books in a “central library” infringed the authors’ copyrights and was not fair use. The judge has ordered a trial in December to determine how much Anthropic owes for the infringement.

U.S. copyright law says that willful copyright infringement can justify statutory damages of up to $150,000 per work.
An Anthropic spokesperson said the company was pleased that the court recognized its AI training was “transformative” and “consistent with copyright’s purpose in enabling creativity and fostering scientific progress.”

If this report is accurate, then, yeah, legally, you can’t just, say, torrent a whole bunch of pirated copies of books. If the company was obtaining illegal copies of those books, that’s definitely copyright infringement. That has been how the law works in the US at least since 2013 when the Jammie Thomas ruling came down. It was still ridiculous that the fine was hundreds of thousands of dollars for downloading and sharing a couple of songs, but that’s where that ruling ended up going despite the excessive nature of the fine. I don’t see how a company would be any different. At minimum, if you are using a commercially available book, then you’d have to purchase a copy of said book. If the company didn’t take that step, then it’s understandable that they’d get hit with copyright infringement.

As for the freak out that AI was somehow going to take over people’s jobs in this area, well, here’s another aspect of copyright law. The output of generative AI can’t be copyrighted either. In 2022, courts have consistently ruled that the output of an AI can’t be copyrighted. That means that the work immediately falls into the public domain. So, if you are well and truly worried that AI is going to replace you in your writing field, you can get AI to write a book for you and claim it as your own after. I wouldn’t recommend doing that since the output is probably going to fall well within the category of AI slop, but legally, that is certainly something you can do (unless copyright law was changed which is… unlikely as of this writing.)

This isn’t even getting into how bad AI is as such technology has already developed quite a history of producing garbage results that get people in trouble in a whole pile of professions. I mean, I still remember a couple years ago when AI generated images caused panic in the artistic community. The panic was that people won’t be commissioning artists for work when AI can produce better results for free. That… turned out to be untrue because artists are still creating images, getting those commissions, and carrying on as per normal. This despite generative AI being able to produce images for a couple of years now. The bottom line is this: the AI takeover conspiracy theory is grossly overblown.

We now have multiple judges ruling that AI training on copyrighted works isn’t itself an act of copyright infringement. I, along with other sensible writers, have been pointing this out for years. Judges are agreeing with our assessments as well. I don’t know if this will get the most stubborn people to agree since they will always continue to argue that reading is copyright infringement if it’s done by a computer, but the reality of the law will continue to beg to differ.

Drew Wilson on Mastodon, Twitter and Facebook.