The Verge→ original

Claude и миллион мертвых книг: как ИИ сожрал наше наследие

Когда OpenAI выпустила ChatGPT, индустрия сошла с ума. Чтобы догнать лидера, Anthropic и другие игроки начали пылесосить интернет, поглощая миллионы защищенных

AI-processed from The Verge; edited by Hamidun News
Claude и миллион мертвых книг: как ИИ сожрал наше наследие
Source: The Verge. Collage: Hamidun News.
◐ Listen to article

Remember that quiet November 2022 when the world didn't yet know what neural network hallucinations were? OpenAI didn't just release a product then, it fired the starting pistol, a sound that made all the Silicon Valley giants abandon their cozy metaverses and run. In this race, Claude from Anthropic became one of the main contenders for the crown. But behind the brilliant façade of a polite and safe AI lies a graveyard of millions of books that no one authorized to use. Let's be honest: Claude exists in its current form only because the industry decided to ignore the rules of decency for the sake of speed.

The AI industry today resembles an era of wild capitalism, where instead of gold mines there are massive data arrays. When it became clear that ChatGPT was not just a toy, but a foundation of a new economy, the question of ethics took a backseat. To teach a model to reason, it needs more than texts from Reddit or Wikipedia. It needs complex structures, rich vocabulary, and deep contexts that can only be found in quality literature. So millions of copyright-protected books became "training data" without the consent of their creators. You didn't think neural networks learn from children's fairy tales from the public domain, did you?

Anthropicoften positions itself as the "good guys" of the AI world, focusing on safety and ethics. But the irony is that even the most "safe" models are built on a foundation of questionable content. Datasets like Books3, containing hundreds of thousands of titles from shadow libraries, became the secret ingredient that allowed Claude to catch up with and in some ways surpass Sam Altman's developments. For corporations, this was simple mathematics: either you use everything that's there for the taking, or your competitor will do it first and capture the market. In this logic, books are just coal for stoking the furnace of progress.

Why does this matter right now? We're approaching a moment when "human" data is simply running out. Neural networks have already read almost everything we've written over the last centuries. And now authors — from novelists to technical writers — are discovering that their years of work has become free fuel for systems that in the future might replace them as well. This is not just content theft, it's a fundamental shift in understanding intellectual property. If before you bought a book to read it, now corporations take it to teach a machine to imitate your style and logic of thinking.

Lawsuits from authors like Sarah Silverman or George Martin are just the tip of the iceberg. The problem is that the legal system is cumbersome, while the AI industry moves at the speed of light. While courts will spend years deciding whether training a neural network is "fair use," the models are already trained, weights are saved, and billions of dollars in investment are committed. Anthropic and other players bet that victors are not judged. Or, at least, the penalties for copyright infringement will be a drop in the ocean compared to future market capitalization.

In the end, we have a strange symbiosis. Claude can analyze a complex legal document for you or write an essay in the style of Proust precisely because it has "swallowed" thousands of similar texts without asking. We got an incredible tool, but the price of its creation is the devaluation of human labor as such. Books didn't just serve as a base, they were processed into digital mince, from which new, convenient consumer interfaces were molded. And now we have to live with this, using the fruits of this intellectual expropriation.

The key point: Anthropic and OpenAI built their empires on data they didn't own, and now there's no turning back. Will the industry be able to survive if it actually has to pay up for every "read" book?

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…