Britannica and Merriam-Webster sue OpenAI over nearly 100,000 articles

Britannica and Merriam-Webster filed a lawsuit against OpenAI in New York. The publishers say the company used their reference materials to train ChatGPT without permission and then generated answers that in places reproduce the texts almost verbatim. The case covers nearly 100,000 articles, and the dispute adds to pressure on AI companies over copyright.

Khamidun Zhemal

AI monitoring · TNW

Apr 30, 2026· 3 min

AI-processed from TNW; edited by Hamidun News

Britannica and Merriam-Webster sue OpenAI over nearly 100,000 articles — Source: TNW. Collage: Hamidun News.

◐ Listen to article

Encyclopedia Britannica and Merriam-Webster filed a lawsuit against OpenAI in New York, claiming that the company used their materials to train ChatGPT without permission. According to the publishers, the bot not only relies on their reference texts but is also capable of reproducing fragments of nearly 100,000 articles too closely to the original.

What Are the Claims

The lawsuit was filed on March 13, 2026, in New York. Britannica and Merriam-Webster accuse OpenAI on two fronts: copyright infringement and use of their brands in a context that, in their view, misleads users. This is not about a few disputed responses, but about a body of materials that publishers have spent decades collecting, editing, and monetizing as a reference product. Now, according to the plaintiffs, this knowledge base helps train a competing interface without a license and without compensation.

A key part of the claims concerns not only the model training itself, but also what happens at the output. Publishers argue that ChatGPT is capable of producing answers that sometimes reproduce their texts almost verbatim. For reference brands, this is particularly painful: if a user gets a ready-made definition, explanation, or historical reference right in the chat, they have less reason to visit the website, purchase a subscription, or interact with the original publisher.

What the Publishers Point To

At the heart of the dispute is a conflict typical of generative AI: the thesis that "a model learns from general data" versus the accusation that the product ultimately substitutes for the original source. In the case of Britannica and Merriam-Webster, the plaintiffs are trying to demonstrate precisely the second logic: value is created by them, but the end user receives an almost ready-made reference result already inside ChatGPT. Therefore, the complaint is built around several recurring themes:

use of their reference materials to train the model without permission;
ChatGPT responses that, according to them, sometimes repeat articles almost verbatim;
the scale of the claims — nearly 100,000 articles and materials;
the risk that a user gets value inside the chat and never reaches the original resource;
trademark claims, if the publisher's brand helps legitimize an answer created by someone else.

Reference publishers occupy a special position in this dispute. Their content is short, structured, factual, and designed to provide a direct answer to a question — precisely this format is particularly convenient for large language models. Therefore, the conflict here looks harsher than in the case of long columns or literary texts: an encyclopedia article or dictionary entry almost perfectly matches what a user expects from a chatbot in one click.

Why the Dispute Is Escalating

The new lawsuit did not appear out of nowhere. About six months before this, the same companies had already filed a similar lawsuit against Perplexity. This shows that this is not a one-off conflict with one specific platform, but a broader strategy: publishers want to challenge the very approach by which AI services build an answer interface on top of other people's archives and then retain the audience. If this logic becomes established in courts, pressure will fall not only on chatbots but also on next-generation search answers. This is an important test for the market's boundaries of acceptable use.

One thing is when a model statistically learns from a large corpus of texts and formulates a new answer in its own words. Another is when it reproduces recognizable fragments or effectively replaces a paid reference product. The court will probably have to sort through several dimensions at once: where learning ends and copying begins, how appropriate references to a well-known brand are, and whether such answers can be considered a direct replacement of the original content.

OpenAI, like other model developers, is already operating in an environment where pressure from rights holders is only growing. For the company, this lawsuit is particularly uncomfortable because it is being brought not simply by media or individual authors, but by two classic reference brands whose value is built on accuracy, editing, and trust in their formulations. If the plaintiffs can convincingly demonstrate systemic content reproduction, it will strengthen the position of those demanding that AI companies license data rather than take it by default.

What This Means

The Britannica and Merriam-Webster case shows that the main legal question around generative AI is shifting from the abstract "can one learn from other people's texts" to the more practical "can one then replace another's product with this." If the court sees ChatGPT not just as a summarization tool but as a direct competitor to reference publications, pressure on the market for licenses, partnerships, and restrictions on answer output will sharply increase.

Hamidun News

AI news without noise. Daily editorial selection from 50+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

Book a free consultation →