The Verge→ original

Publishers sue Meta over training Llama on illegal copies of books and journals

Publishers have sued Meta over training Llama on illegal copies. Five major publishers (Macmillan, McGraw Hill, Elsevier, Hachette, Cengage) and writer Scott Tu

Publishers sue Meta over training Llama on illegal copies of books and journals
Source: The Verge. Collage: Hamidun News.
◐ Listen to article

Meta has been sued by five major publishers and writer Scott Turow. The accusation is serious: the company allegedly committed "one of the largest copyright infringements in history" while training the Llama model on pirated copies of books and scientific journals.

How Meta Trained Llama

Meta deliberately copied books and scientific journals from piracy sites (LibGen, Anna's Archive, Sci-Hub, Sci-Mag and others) and used this material to train Llama without the permission of authors and copyright holders. Publishers claim in the lawsuit that this was done intentionally, not accidentally. The logic was simple: piracy sites provide content for free, while licensing costs money. Meta chose an economically rational solution — bypass legal channels for acquiring content and download directly from illegal sources. This was not a one-off incident. We are talking about a systematic process: finding piracy sites, downloading, uploading to the training dataset. The company knew what it was doing, and it did it deliberately.

Who Filed the Lawsuit

The lawsuit was filed by five of the world's largest publishers:

  • Macmillan — fiction, textbooks, scientific publications
  • McGraw Hill — professional literature and educational content
  • Elsevier — scientific journals worldwide
  • Hachette — one of the "Big Five" US publishers
  • Cengage — educational content and textbooks

Plus writer Scott Turow, author of bestsellers "The Burden of Proof" and "Presumed Innocent". This is an important moment: this is not just a corporate dispute over money. In the lawsuit, there is a separate voice of an author, which gives the claims legitimacy and emotional weight. The issue is not just about corporate profits, but about the rights of individual creators.

Why This Matters

On the surface, this is a legal dispute over money and copyright. In reality, it's about the rules of the game in the age of AI. The question is simple: if Meta can download other people's content from piracy sites and train on it without consequences, why should authors and publishers trust in the protection of their work?

The lawsuit sets a precedent: AI companies cannot appropriate creative content with impunity.

"This is one of the largest cases of copyright infringement in

history," the lawsuit states.

How This Will Develop

The trial may last for years. Meta will likely defend itself by citing fair use or arguing that content from piracy sites is already in the public domain. But such arguments are weak in this context: the company deliberately chose a source of illegal content, knew about it, and did not attempt to negotiate. Even if the trial is long, it is already changing market dynamics. Other AI companies (OpenAI, Google, Microsoft, Anthropic) will likely begin to distance themselves from using clearly illegal content and switch to licensing. Publishers will demand compensation for the use of their works in training large language models.

What This Means

This lawsuit is a symbol of the end of an era of silent "free training" on other people's content. Previously, such things happened in the shadows, now they are in the public eye. Two things will likely happen simultaneously: some companies will negotiate with publishers and authors (and pay for data), while others will lose in court (and pay even more). As a result, the market will be restructured under new rules. AI will not be a "free" technology, but will require licensing and payments. This may slow down the development of AI, but it is fair to those whose work is used in training.

ЖХ
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…