Copyright and AI models: Adobe targeted for SlimLM training

Accused of training its SlimLM artificial intelligence model from pirated works, Adobe is facing a class action lawsuit brought by an American author. The case reopens the debate on the traceability of the corpus used for AI, and raises complex legal issues for the entire industry. This dispute could have repercussions for the entire publishing, digital and printing ecosystem.

Antoine Gaillard Published on December 23, 2025

The lawsuit filed in December 2025 against Adobe by Elizabeth Lyon, an American author, brings back to the table an issue now familiar to publishing professionals: the use of copyright-protected content in the training of artificial intelligence models. At issue here is the SlimLM model, developed for document tasks on mobile devices. According to the plaintiff, some of her works were used without authorization in the SlimPajama-627B dataset, presented as open source but containing datasets from RedPajama and Books3, which are known to have already led to litigation.

In Adobe's case, the SlimPajama-627B set is an aggregate of several sources, one of which - Books3 - contains over 191,000 books, often retrieved without an explicit license. The absence of clear documentation on the traceability of these corpora makes any verification seriously difficult.

Training an AI model rarely relies on a single data provider. In the SlimLM case, Adobe acknowledges the use of SlimPajama-627B, published by Cerebras, but finds itself sued for the indirect inclusion of protected content. In the event of a dispute, who is liable? The model producer? The dataset provider? Or the company that integrates AI into its products? This legal vagueness, coupled with the lack of standards on the rights associated with the data used, increases uncertainty for the entire graphics chain.

Books3 has already caused quite a stir. This dataset, regularly cited in legal proceedings against Apple, Salesforce and Anthropic, has come to symbolize the excesses of content collection for AI. For publishing houses and authors who manage editorial content, the fear is real that their production will be integrated into models without any compensation. This is all the more true given that the restitution of content via AI can sometimes verge on plagiarism or parasitic reproduction.

Anthropic's settlement with several authors, for an estimated $1.5 billion, illustrates that this type of litigation can lead to major financial agreements. A similar scenario cannot be ruled out in the case of Adobe.