Should AI Training on Material Without Permission Be Copyright Infringement? Copia Says “No”

When an AI trains on copyrighted material, does it need permission from the rightsholder? The Copia Institute is arguing that it doesn’t.

One common thread a lot of different Artificial Intelligence (AI) models have is that they often need actual data to train. Whether it is trying to write convincingly human sounding text (such as ChatGPT) or figure out how to generate images (such as Stable Diffusion), many AI systems need source material to figure out how to learn the art.

For some content creators out there, there was a great moral panic that AI is going to take over creating images or writing – so much so that there would be no need for humans to create that content when it can be all auto-generated for fewer dollars. In response to this loopy theory, some have taken to the extreme of claiming that AI can’t train on their material as that would constitute copyright infringement. If you take a neutral position on things and think these things through logically, you should already be noting just how far off the rails these thoughts really are.

From the getgo, the idea that no human would ever be able to create images or text because AI would replace it all simply doesn’t reflect the long-standing precedence set by previous forays of AI in the real world.

Take, for example, Deep Blue. There might be arguments of whether or not it was actually AI behind that technology, but the ultimate conclusion was that it marked a moment where computer technology bested humanity when it defeated Garry Kasparov. If one were to take the idea that once a computer technology is better than humanity that humanity would be replaced, well, chess would only be played by computer systems from that point on.

Obviously, that didn’t happen. Humans still play in tournaments all the time and people still play chess. What actually ended up happening was that AI offered a glimpse into a higher level of chess expertise than current human players are seemingly incapable of achieving. One example of things that have fascinated chess commentators is the games between Stockfish and AlphaZero. Even to a chess novice, games like that ended up being quite fascinating. It was interesting, but not really threatening to human players.

In 2011, there was the Watson AI that took on Jeopardy legends Ken Jennings and Brad Rutter. It ultimately won handily. Was that the end of humans playing that game? Not at all. We still have human contestants playing quiz shows.

In 2018, there was the famous AlphaGo vs Lee Sedol game. Alphago, of course won in the end. IN a follow-up, there was the creation of Alphago Zero which bested AlphaGo. 100 games were released and there was considerable excitement in the Go community. New ways of thinking were demonstrated and it showed an otherwise unknown side of the game.

If you notice a common thread in all of this, AI ended up being a tool to see things differently. It becomes a tool for new ways of thinking as opposed to replacing humanity at something.

AI alarmists will likely point out that these are just games and not anything related to productivity. Things like ChatGPT creates text and something like that will likely replace human beings in anything writing from journalism to fiction writing. The problem with that thinking is that this misunderstands the nature of things like ChatGPT. For ChatGPT, the idea is that it creates text that sounds like it was written by a human being. It can translate from one language to another and can create text that sounds convincingly written by a human. What it doesn’t do is write on subject matters perfectly. The AI isn’t necessarily designed to, for instance, write the news effectively.

When humans write a news article, they care about the accuracy of the facts. Dealing with current events isn’t something that something like ChatGPT can handle well. It utilizes material that are multiple years old, so it is ill-prepared for events happening today. All that technology is concerned with is whether or not a piece of text sounds like it was written by a human, not necessarily whether a piece is accurately portraying events. Some don’t understand that and end up being surprised when news articles end up being factually incorrect half of the time (and that’s not an exaggeration). Can it write content based on specific styles? Sure. Can it write a flawless fact-based article? Maybe not so much.

If anything, AI can very easily be used to better and more efficiently write news articles. Things like basic grammar correction or suggestions of different words. Much like every other previous examples, it ends up being a tool to help improve human created content, not as a tool to replace a human writer.

A similar story can be said about lawyers. There was a rather famous case where a lawyer used ChatGPT to write his legal filings. That really didn’t end well. Cases in the legal filing wound up being fabricated to fill in blanks and the judge in question was not amused for reasons that should be obvious. In another case, there was the famed DoNotPay lawyer case where there was an AI service that was supposed to write legal documents for you. That also didn’t end well.

Again, the common thread in all of these cases is that AI can be a very useful tool to help humanity, but trying to make AI replace humanity for things like writing doesn’t typically end well. What’s more is that I’m unconvinced that this is going to change any time soon.

While the idea that AI is somehow going to replace people when it comes to creating content has been blown wildly out of proportion, there is the issue of copyright. Can AI “train” on copyrighted material. A cursory look over how these things work should lead any level-headed person to the conclusion that this isn’t an issue.

When any human being learns the craft of drawing or writing, what do humans do? They study. They look at existing material and figure out how they can better hone their skills and create content that is of a higher quality. So, if you look at a picture of Tony the Tiger and figure out how the lines are drawn and how they create such a well made character, is that act copyright infringement? Obviously not.

The question is, if AI studies different pieces of art to figure out how to create content, is that copyright infringement? If humans can do it, then the logical conclusion is that AI can do that. What would be copyright infringement is if the AI outputs work that is already protected under copyright, then a case can be made. However, that’s not what a lot of existing AI models are doing. They are learning from that material and creating new works – works that are arguably not copyrighted in the first place.

For the life of me, I personally can’t think of how copyright can be used to stop AI from using material to train on. It turns out, neither can the Copia institute. In a filing submitted to the US Copyright Office, the Copia institute argues that such activities should not constitute copyright infringement. Here’s their posting about their filing:

In our comment we made several points, but the main one was that, at least when it comes to AI training, copyright law needs to butt out. It has no role to play now, nor could it constitutionally be changed to have one. And regardless of the legitimacy to any concerns for how AI may be used, allowing copyright to be an obstructing force in order to prevent AI systems from being developed will only have damaging effects not just deterring any benefits that the innovation might be able to provide but undermining the expressive freedoms we depend on.

In explaining our conclusion we first observed that one overarching problem poisoning any policy discussion on AI is that “artificial intelligence” is a terrible term that obscures what we are actually talking about. Not only do we tend to conflate the ways we develop it (or “train” it), with the way we use it, which presents its own promises and potential perils, but in general we all too often regard it as some new form of powerful magic that can either miraculously solve all sorts of previously intractable problems or threaten the survival of humanity. “AI” can certainly inspire both naïve enthusiasm prone to deploying it in damaging ways, and also equally unfounded moral panics preventing it from being used beneficially. It also can prompt genuine concerns as well as genuine excitement. Any policy discussion addressing it must therefore be able to cut through the emotion and tease out exactly what aspect of AI we are talking about when we are addressing those effects. We cannot afford to take analytical shortcuts, especially if it would lead us to inject copyright into an area of policy where it does not belong and its presence would instead cause its own harm.

Because AI is not in fact magic; in reality it is simply a sophisticated software tool that helps us process information and ideas around us. And copyright law exists to make sure that there is information and ideas for the public to engage with. It does so by bestowing on the copyright owner certain exclusive rights in the hopes that this exclusivity makes it economically viable for them to create the works containing those ideas and information. But these exclusive rights necessarily all focus on the creation and performance of their works. None of the rights limit how the public can then consume those works once they exist, because, indeed, the whole point of helping ensure they could exist is so that the public can consume them. Copyright law wouldn’t make sense, and probably not be constitutional per the Progress Clause, if the way it worked constrained that consumption and thus the public’s engagement with those ideas and information.

It also would offend the First Amendment because the right of free expression inherently includes what is often referred to as the right to read (or, more broadly, the right to receive information and ideas). Which is a big reason why book bans are so constitutionally odious, because they explicitly and deliberately attack that right. But people don’t just have the right to consume information and ideas directly through their own eyes and ears. They have the right to use tools to help them do it, including technological ones. As we explained in our comment, the ability to use tools to receive and perceive created works is often integral to facilitating that consumption – after all, how could the public listen to a record without a record player, or consume digital media without a computer. No law could prevent the use of tools without seriously impinging upon the inherent right to consume the works entirely. The United States is also a signatory to the Marrakesh Treaty, which addresses the unique need by those with visual and audio impairments to use tools such as screen readers to help them consume the works to which they would otherwise be entitled to perceive. Of course, it is not only those with such impairments who may have need to use such tools, and the right to format shift should allow anyone to use a screen reader to help them consume works if such tools will help them glean those ideas effectively.

What too often gets lost in the discussion of AI is that because we are not talking about some exceptional form of magic but rather just fancy software, AI training must be understood as simply being an extension of these same principles that allow the public to use tools, including software tools, to help them consume works. After all, if people can direct their screen reader to read one work, they should be able to direct their screen reader to read many works. Conversely, if they cannot use a tool to read many works, then it undermines their ability to use a tool to help them read any. Thus it is critically important that copyright law not interfere with AI training in order not to interfere with the public’s right to consume works as they currently should be able to do.

In conclusion, they argue that AI training on copyrighted work should fall under Fair Use. Honestly, I don’t see how any other conclusion could be reached in all of this other than maybe say that it’s generally a legal act. It doesn’t make sense that it’s an act of copyright infringement to learn from a work just because it isn’t a human doing that.

I mean, sure, make rules guarding against turning AI into a weapon against humanity. Knock yourself out on that. A Terminator style world isn’t going to happen tomorrow by any stretch of the imagination. However, barring AI from training on material to create paintings? It strikes me as a tad ridiculous.

Drew Wilson on Twitter: @icecube85 and Facebook.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: