Unlike the U.S. and E.U., Japan has decided not to enforce copyright law on material used to train generative AI (gen AI) models. This bold approach may change; however, this is the Japanese government’s current stance as Japan plays catch-up in the gen AI arms race.
Japan’s gen AI policy permits AI models to process any data “regardless of whether it is for non-profit or commercial purposes, whether it is an act other than reproduction, or whether it is content obtained from illegal sites or otherwise.”
In an April 2023 blog post, Keiko Nagaoka, the Japanese Minister of Education, Culture, Sports, Science, and Technology, confirmed the nation’s policy. According to lawmaker Takashi Kii, Nagaoka is on record as saying, “It is possible to use the work for information analysis—regardless of the method, regardless of the content.”
Hence, as long as companies use copyrighted content solely to train gen AI models, this behavior appears to be safe from regulatory action. That said, LLM training involves several different elements, each of which are worth discussing in turn.
Japan’s take on LLM training, usage, and copyright issues
Firstly, we should differentiate between “model training” and “model usage.” Model training involves the creation, training, and fine-tuning of a large language model. Whereas, model usage involves user prompts (e.g., voice, text, or image inputs), as well as the AI output based on those user prompts.
In 2018, the Copyright Act of Japan (1970) was amended to create accommodating provisions for AI training. As lawyers from Nishimura & Ashi explain, this 2018 amendment made [Japanese copyright law] “one of the more ‘relaxed’ copyright acts in the world under which business would likely be allowed to use copyrighted training data for AI development unless exceptional requirements are met by those claiming infringement.”
As long as the copyrighted training data is being used for information analysis (rather than human enjoyment), that usage is likely not subject to Japanese copyright law.
As the lawyers above note, “Article 30-4 of the Copyright Act permits the use of a copyrighted work without the permission of the copyright holder to the extent deemed necessary, provided that the purpose is not for oneself or others to ‘enjoy’ the thoughts and feelings expressed in the work.” One example of an instance in which humans are not “enjoying” the work would be “information analysis” or model training.
To recap, so far we’ve addressed Japanese LLM service providers’ liability (or lack thereof) during the model training process. At this juncture, we should point out a couple of caveats.
Important caveats: Legal vulnerabilities in the users’ prompt input stage and the service providers’ AI output stage
Oddly, the provisions for information analysis likely don’t apply to the “prompt input stage.” Users who input copyrighted works (e.g., text, images, illustrations, video, audio) could be held liable — because this isn’t information analysis.
Moreover, it’s unclear if the provisions for information analysis even apply to the AI output stage. According to Japanese law, there is copyright infringement in the AI output stage if the model’s output is similar to someone else’s work and reliant on someone else’s work.
There are ways in which the companies and users can each be held liable. For example, if an AI output were to come out of an LLM in a way that humans enjoyed the work (and this AI output is found to be significantly similar to and reliant upon copyrighted work), then both the LLM service provider and the user who input the copyrighted work could be held liable.
These situations are handled in Japan on a case-by-case basis. If the copyright holder is “unreasonably harmed” (e.g., their future earnings or works are jeopardized), then again, the LLM service providers and users could be liable.
Summarizing the unique Japanese stance
In Japan, copyright infringement will not be enforced on companies that use copyrighted materials to train generative AI models.
These companies can bring as much copyrighted material into their model’s training data as they want; however, regulators do not want to see AI output that seems overly similar and reliant upon that copyrighted work. To be sure, this is a fine line to straddle.
Also, the users themselves seem to be far more vulnerable to legal repercussions than the companies building out the LLMs. After all, the users’ inputting of copyrighted works isn’t covered by the information analysis provision.
These caveats aside, Japan’s copyright law is still extremely flexible, allowing LLM service providers with considerable leeway when it comes to using copyrighted works to train models.
The gen AI copyright infringement landscape in the United States
Currently, there are many copyright infringement lawsuits between LLM service providers and creators of copyright-protected works. As a quick example, several novelists (e.g., Jonathan Franzen, John Grisham), artists, and computer programmers have ongoing suits against OpenAI.
Of course, the New York Times famously sued Microsoft and OpenAI back in December. Since then, OpenAI has argued that they did nothing wrong, as the material they hoovered up was in the public domain and allegedly covered under fair use.
Lawyers for the Times beg to differ, arguing that the regurgitation of entire articles without any compensation to the New York Times can hardly be considered fair use. This is the issue in a nutshell. Content creators and service providers want to be paid for their work, and right now, it isn’t happening.
It’s not as if the New York Times is “anti-AI.” In fact, despite their suit against Microsoft and OpenAI, they recently hired an Editorial Director of AI Initiatives.
Likewise, Seattle-based Getty Images recently sued Stability AI for what it called a “brazen infringement” of Getty’s images “on a staggering scale”; then Getty Images launched a new service, Generative AI by Getty Images, which allows customers to create images trained on Getty’s proprietary corpus.
Thus, many companies are taking a two-pronged attack. They’re embracing gen AI initiatives internally, while simultaneously suing the big gen AI companies (e.g., Stability, OpenAI) that have been using their materials without compensation.
The content creators’ plight
Most of us can agree that writers, journalists, photographers, coders, and artists of all stripes should not have their copyrighted materials used without compensation. To that end, companies like The New York Times and Getty Images are working to develop new business models. In the case of Getty Images, the artists will receive royalties.
Getty Images CEO Craig Peters highlights the fact that all creators with images in the Getty AI training set will be compensated over time. As Peters explains, [Generative AI by Getty Images is] “actually sharing the revenue with them over time rather than paying a one-time fee or not paying that at all.”
This sounds like a fair business model — and one that other LLM service providers should consider.
The “Zarya of the Dawn” case study
Today, the gen AI and copyright question remains unsolved in the United States; however, an early ruling from the U.S. Copyright Office can shed some light on things.
Last year, the U.S. Copyright Office decided to issue a partial copyright to Kristina Kashtanova’s “Zarya of the Dawn” comic book. This is an insightful use case, as Kashtanova used Midjourney to create the images in her comic.
The Copyright Office decided that Kashtanova “is the author of the work’s text as well as the selection, coordination, and arrangement of the Work’s written and visual elements.” That said, she is not the owner of the images themselves.
Although Kashtanova didn’t receive copyright on the images, she thought it was a great ruling for American artists using AI tools. Kashtanova writes, “When you put your images into a book like Zarya, the arrangement is copyrightable. The story is copyrightable as well as long as it’s not purely AI produced.”
It’s still early days, but this ruling does provide insight as to how U.S. lawmakers are considering the use of AI tools and copyright law.
What is the way forward? Can US lawmakers learn from the Japanese approach?
As it currently stands in the U.S., the legality around using copyrighted materials to train LLMs has not been solved. Given that Japan is behind in the gen AI arms race, it makes sense that they’re adopting an aggressively laissez-faire approach to copyright infringement in the AI space.
I don’t know what the content creator backlash will be in Japan; however, I expect there will be some. What I will say is that their thought process is in the right place: Japan wants their LLMs developed quickly without unreasonably harming individual content creators’ ability to make money. Of course, whether that is happening in practice I cannot say.
The problem with OpenAI’s business model is that it is clearly interfering with the New York Times and, by extension, Times’ journalists’ ability to make money in the future.
If OpenAI is going to use these writers’ works, the writers need to be compensated. I very much agree with the Times’ lawyers who see an inherent incongruence between claiming fair use and not offering fair compensation.
We’ll have to wait and see how Japan’s approach plays out. For now in the U.S., I expect to see more lawsuits and hopefully, the emergence of unique profit-sharing business models.