TechCrunch→ original

Merriam-Webster and Encyclopedia Britannica Sue OpenAI Over 100,000 Articles

Merriam-Webster and Encyclopedia Britannica have sued OpenAI. They say the company used nearly 100,000 copyrighted articles without permission to train…

AI-processed from TechCrunch; edited by Hamidun News
Merriam-Webster and Encyclopedia Britannica Sue OpenAI Over 100,000 Articles
Source: TechCrunch. Collage: Hamidun News.
◐ Listen to article

Merriam-Webster and Encyclopedia Britannica have filed a lawsuit against OpenAI, accusing the company of massive copyright infringement. According to the plaintiffs, OpenAI used nearly 100,000 copyright-protected articles to train its large language models without permission or compensation. Merriam-Webster is one of the oldest and most authoritative English-language dictionaries, founded in 1831.

Encyclopedia Britannica appeared even earlier: it has been published since 1768 and for two and a half centuries remained the world's leading English-language encyclopedia. During this time, editors from both organizations created one of the most comprehensive and meticulously verified corpora of reference texts. Precisely this type of data—accurate, repeatedly verified, tied to specific definitions and dates—is especially valuable when training language models that must deliver factually reliable answers.

The essence of the claims is as follows: OpenAI systematically included materials from these publications in its training datasets for LLM systems without concluding licensing agreements. According to the lawsuit, nearly 100,000 articles are at stake—not a random sample from publicly available networks, but targeted, large-scale borrowing of professionally created content. Creating these materials required years of labor by thousands of lexicographers, researchers, scientific editors, and subject-matter experts.

All of this labor is now, in essence, being used as fuel for a commercial product—without payment and without consent. The lawsuit is part of a broader wave of legal claims against OpenAI. Earlier, similar demands were made by the New York Times, major literary agencies, the Authors Guild, associations of independent journalists, and several renowned writers.

The overarching logic of all these lawsuits remains consistent: AI companies have built a multibillion-dollar business on others' intellectual work—without paying a cent or asking permission. The scale of claims is growing: if previously individual authors filed suits, now organizations whose brands are directly associated with reliability and academic authority are entering the fray. This is no longer just a story about copyright—it is a story about who owns the reputation for reliability in the age of AI.

OpenAI traditionally responds with the same argument: training on publicly available data falls under the fair use doctrine in American copyright law. An additional argument from the company: it does not reproduce original texts verbatim, but merely trains statistical patterns on them. Neither of these arguments has yet received definitive judicial confirmation.

Most cases are in early stages, and courts have not yet developed a consolidated position on where the boundary of fair use lies with respect to LLMs. It is the participation of dictionary and encyclopedia publishers that makes this lawsuit particularly significant. Their product is not news written against a deadline, nor blogs published for reach.

It is slowly and expensively created reference knowledge: verified, structured, with rigorous editorial standards. Such data is the foundation for answers that AI presents as fact. It is no accident that these were included in the training corpus.

And it is no accident that they now find themselves at the center of the most principled legal proceedings in the history of generative AI. The outcome of these cases will determine the rules of the game for the entire industry. A victory for rights holders will require AI companies to retroactively license training data—with unpredictable financial consequences.

A victory for the fair use concept will create a precedent that effectively deprives authors of leverage in the era of generative AI. While the outcome remains unclear, one thing is evident: the industry cannot indefinitely build the future on others' past—without agreeing on terms.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…