Habr AI→ original

Claude Code built IndexedDB from scratch: 1208 Web Platform Tests passed, but agent's 95% figure disputed

Claude Code implemented the browser API IndexedDB on top of SQLite from scratch — from a single prompt to a working codebase. 1208 tests from the official…

AI-processed from Habr AI; edited by Hamidun News
Claude Code built IndexedDB from scratch: 1208 Web Platform Tests passed, but agent's 95% figure disputed
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Claude Code implemented IndexedDB — a full-featured browser API for storing structured data — on top of SQLite in a single work session. The experiment tested how far an LLM-agent can go when independently developing a complex low-level system.

Task: One prompt instead of an IndexedDB team

IndexedDB is a browser standard for client-side data storage: asynchronous transactions, indexes, cursors, schema versioning, working with binary blobs. Mature open source implementations exist — for example, fake-indexeddb on JavaScript — created by teams through years of iteration. The experiment's question: can Claude Code do it from scratch, given a single prompt?

The agent was tasked with writing an IndexedDB implementation on top of SQLite. The backend choice is logical: SQLite is a stable, well-tested engine with support for transactions, indexes, and atomic operations. It provides persistence, while the agent needed to implement the browser API on top of a standard SQL layer.

1208 tests and contested 95%

Quality was measured through Web Platform Tests (WPT) — the official test suite for browser standard compliance, used by the Chrome, Firefox, and Safari teams themselves. WPT contains thousands of cases covering the specification in detail: from basic operations to complex scenarios with versioning and parallel transactions.

After running 1208 tests, they all passed successfully. The agent declared 95% standard compliance in its final report. For a self-generated implementation, this is an impressive figure. The experiment authors questioned it: actual compliance is noticeably lower when accounting for edge cases and load scenarios outside the main test suite.

  • 1208 WPT tests passed successfully
  • Agent independently ran tests and iterated on errors
  • Authors consider the claimed 95% inflated
  • Performance on large data volumes was a weak spot
  • Parallel transactions and non-standard keys behave unpredictably

Where the agent fell short

The codebase is functional, but with noticeable limitations. Performance on large data volumes lags behind mature implementations: abstraction layers on top of SQLite add overhead. Edge cases — parallel transactions, non-standard key types, complex range cursors — are handled unstably or incorrectly. This is a characteristic trait of LLM-driven development: the model handles tasks well that can be automatically verified, and poorly those with subtle invariants that tests don't cover. The agent optimizes for green CI, not for correct architecture. The result looks convincing on the surface, but hides technical debt in corner cases.

What this means

The experiment shows: an LLM-agent can create a working implementation of a complex browser standard in one session — from prompt to a thousand passed tests. This is no longer a textbook example, but tangible proof of progress in agent systems. But moving such code to production without review is risky: the agent optimizes for visible metrics and may miss non-functional requirements. The right conclusion: LLM accelerates the first draft, but requires an experienced reviewer.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…