Copyright law and AI is a twisted logic puzzle, not a no-brainer
Despite what the hot takes say, there’s no easy answer to this debate
One thing I know for sure after three years on the AI beat. No one, and I mean no one, has all the answers. And, in fact, I have yet to come up on any argument or debate in the world of AI that does not have two sides.
Since beginning to cover the issues around copyright and generative AI in April of 2022 (with the rise of image models like DALL-E 2 and Stable Diffusion), I have understood that while the negative impacts to creators whose works have been trained by AI systems are real and unassailable, the arguments are by no means one-sided.
Copyright and IP law is complicated. That’s a fact.
But one source I spoke to for the Thursday edition of Fortune’s Eye on AI newsletter does not seem to think so. For the lead essay in the newsletter yesterday, titled AI and copyright is a logic puzzle no one can solve — and fair use won’t untangle it, I spoke to
Atkinson asked on LinkedIn: “Serious question: Why should my students have to pay to use a textbook rather than be allowed to download from pirate sites, but $60 billion Anthropic shouldn’t?”
Atkinson was referring to another ongoing legal case filed against Anthropic by three authors, who alleged that Anthropic infringed upon their copyrights by using pirated versions of their books to train its AI systems. Anthropic’s argument is that as long as Anthropic is not outputting plagiarized content, then the actual training of the model should be fair use of the material.
But learning, Atkinson explained, is a transformative use—so why can’t his students argue that they’re taking books from pirated sites in order to learn something?
I loved the idea of such a tidy, logical notion. If one is fair use, why isn’t the other? But when I reached out to several lawyers on the issue, I received several thoughtful rebuttals that left me as uncertain as ever.
Those lawyers, including several I respect, basically said that Atkinson’s argument, while clever, is also simplistic — and not taking into account the way copyright really works under current US law.
While Atkinson’s argument “interprets AI as essentially eviscerating all IP law,” Bradford Newman, who leads the AI and blockchain practice at law firm Baker McKenzie, told me. But from an intellectual property perspective, there is a big difference between the AI training and a student downloading pirated content, he explained.
If someone wants to download a specific book or watch a specific movie and downloads it for free, that’s clear copyright infringement, he said. But that’s harder to address with an AI model.
“If you trained an algorithm on one copyrighted work only, that would be a problem,” he said, but market harm in AI training is harder to prove. “Training on many books doesn’t necessarily harm any specific author’s market—especially if the model isn’t reproducing their work verbatim.”
Scraping copyrighted books may still strike most of us as deeply unfair, and lawyers representing both creators and AI companies — including OpenAI, Anthropic, and Meta — are arguing these issues in several high-profile cases. But the fact remains: AI copyright debates are not one-sided.
Even the US Copyright Office agrees the issue of “fair use” is complicated. The twisted pretzel of logic is clearly why the agency weighed in on the arguments in early May, with a 108-page report on whether the unauthorized use of copyrighted materials to train generative AI systems is fair use.
But Atkinson does not seem to think there is any argument to be had. He doubled down in direct messages with me, and then wrote a new blog post on LinkedIn, tripling down on his belief that apparently there is no other side to the story, quoting my article and messages at length.
He told me that “I hate to sound like a broken record, but I do believe if you follow the arguments to their logical conclusions, you'll find that they all work at least as well for humans as for GenAI companies.”
I, on the other hand, believe both sides of the copyright-and-AI debate have logic to them. If you follow Atkinson’s argument to its logical conclusion, it seems to suggest that all copyright and IP protections could be invalidated. I’m certain he’s trying to point out the hypocrisy in the idea that AI companies can train on millions of creative works without issue, while a human who downloads one pirated book is considered a thief. And like Atkinson, I enjoy these conversations — and hope to have more of them.
But his unwillingness to acknowledge the complexity of the other side risks dismissing the nuance — and the law is rarely black and white. I think it diminishes a serious and highly-complex legal issue that many lawyers have told me could ultimately reach the Supreme Court.
From an economic viewpoint, a pirated book used by a student for an exam is an input to the production of an outcome (mark) that remains limited to the student. The pirated book used by an AI company is an input to the production of a product that (i) might compete with book creators and (ii) feeds growth of the company's market power. Hence, in the latter case the goal is external to the company that produces negative spillovers, and the unduly advantage obtained by not bearing the input cost can be considered as a practice of unfair competition! One could say: well, if the student learns more than her/his/their colleagues than that's also an advantage. But to compare one atomistic unit (the student) in a large market (the class or the future Labour market) and a company in an oligopolistic setting is not a proper comparison. So, yes, AI companies' copyright infringement is worse in terms of social welfare compared to individual ones!
Your prediction about the complexity proved spot on with the recent Anthropic ruling. The judge found AI training itself to be fair use, but Anthropic still faces a December trial over using millions of pirated books to build their training library. I guess the implications around AI training will just keep unraveling over time.