As discussed in my previous article, recent guidance from the Copyright Office and subsequent judicial opinions supporting the Office’s position have made it clear that purely AI-generated works are not protectable by copyright. One of the primary principles underlying this view is the high value placed on human creativity. Copyright law is aimed, first and foremost, at protecting and incentivizing human contributions to society. The legal landscape has reserved copyright status for human creators, leaving AI outside and unprotected, but is AI infringing on the protected works of human authors? A growing number of claims by various creatives allege just that.
Visual artists and writers from Sarah Andersen to Sarah Silverman have joined class action lawsuits against AI companies for “scraping” the internet for materials to train their AI algorithms to produce text and images of their own. These materials are alleged to include the copyrighted works of these artists. Sarah Andersen is among the class of visual artists suing companies such as Stable Diffusion, Midjourney, DreamStudio, and DeviantArt, alleging that their works were used without permission. The class further alleges that images created by these algorithms constitute unauthorized derivative works that harm the market for the artists’ original works. Classes of writers, including Sarah Silverman, are suing Meta and OpenAI (in separate litigation) for similar alleged violations of their copyrighted works.
These cases, along with several others still pending throughout the country, when decided, will have major ramifications for AI companies. In particular, the coming rulings will draw the bounds of the fair use defense in the context of content-generating AI, both in terms of the input of copyrighted content to train the AI and the output of AI-generated content that may be substantially similar to copyrighted works.
AI-generated works that resemble the copyrighted works of authors are the most easily recognizable form of potential infringement. Artists catch wind of these creations and believe their rights have been infringed upon. The derivative work right, as part of the bundle of rights under the umbrella of copyrights, grants an exclusive right to the author of the underlying work to create or authorize derivative works. A derivative work is a work “based upon one or more preexisting works.” The way AI platforms function technologically tempts us to quickly conclude that any work generated by an AI is “based upon” a preexisting work or works. However, when interpreting the “based upon” language, it is crucial to remember precisely what copyrights aim to protect, namely the discrete expression of an otherwise unprotectable idea. In other words, there may be a difference in how courts perceive AI-generated works that feature specific expressions, such as characters or scenes, versus AI-generated works that simply evoke a familiar voice or style associated with the original author.
Earlier court decisions on the nature of derivative works suggest that AI-generated works that mimic, even convincingly, the style of an original artist may escape liability. For example, in Dave Grossman Designs, Inc. v. Bortin, the Northern District of Illinois emphasized the importance of the idea-expression dichotomy in copyright law when evaluating whether a work which closely imitates the style of another is infringing, saying, “Picasso may be entitled to a copyright on his portrait of three women painted in his Cubist motif. Any artist, however, may paint a picture of any subject in the Cubist motif, including a portrait of three women, and not violate Picasso’s copyright so long as the second artist does not substantially copy Picasso’s specific expression of his idea.” While perhaps frustrating to individual artists who see their style imitated by others, this stance comports with our understanding of innovation and progress in the arts. “Inspiration” is often traceable to prior works and, so long as specific expressions aren’t being copied, copyright law seeks to encourage innovation and progress by permitting artists to leverage and iterate upon existing styles and ideas.
Even for derivative works that do make use of particular elements of a prior work and are thus “based upon” those prior works in the sense intended by the Copyright Act, the fair use defense may be an available tool to escape liability.
What does this mean for the infringing capacity of AI? Courts will tell us in due time, but in anticipation of coming precedent, it is reasonable to expect that the idea-expression dichotomy will play a large role in determining whether AI-generated works infringe the derivative work right of authors.
In addition to considering whether the resulting product infringes, there is also the question of whether infringement occurs during the process of creation. Generally, AI platforms are “trained” on massive archives of creative content, such as pictures or text, some of which are unlicensed copyrighted materials. The AI detects patterns within the universe of content, creates rules and then later can reference those rules and patterns to understand new inputs and generate outputs consistent with prompts it receives. The existence of copyrighted material in the databases these AI platforms work from requires an act of digital copying, and thus a reproduction of the copyrighted material. If unauthorized, as is the case in many of the claims raised by artists in the last couple of years, this is a violation of the reproduction right and therefore copyright infringement, unless there is an available defense.
Once again, the AI platforms turn to the fair use defense. In fact, the recent and controversial decision in Google LLC v. Oracle America, Inc. may provide a playbook for AI platforms to argue that their data sets make transformative use of otherwise exact copies of someone else’s copyrighted material. In this case Google utilized thousands of lines of code for Java APIs and source code owned by Oracle to develop their Android operating system. Despite the direct copying of so much copyright protected code, the Supreme Court opined that their fair use defense was successful.
The AI platforms similarly argue that the copying of content for the purpose of training AI is fair use. Analogizing the AI creative process to that of the human creative process is helpful in considering this issue. If a human author decides to write a research paper, she will likely copy articles, books, and other source materials in preparation for writing her paper. If she reads all the material and then writes her article in her own words (her own “expression of the idea”) while giving appropriate credit to her sources, most Courts would find that her original copying of the source material for her own was fair use. AI platforms argue that AI follows the same process. Training material, they claim, is used by AI in a way that should be determined to be fair use. It will be interesting to see how Courts tackle this issue.
Co-Author: Jackson Polansky, a research and writing consultant specializing in intellectual property.