IEEE Spectrum AI→ original

GPT-4 helps archivists transcribe handwritten documents 50 times faster

A study by a Canadian university found that GPT-4 transcribes handwritten archival documents faster and more cheaply than the specialized Transkribus. It is 50

GPT-4 helps archivists transcribe handwritten documents 50 times faster
Source: IEEE Spectrum AI. Collage: Hamidun News.
◐ Listen to article

In 2023, Mark Humphries, a historian and coordinator of a generative AI application program at Wilfrid Laurier University (Waterloo, Ontario), faced a massive problem. He had digitized 10 million pages of Canadian pension records from World War I, but without an index and standardization, these archives were practically unusable — finding a specific pensioner meant flipping through files blindly. Records were kept by hundreds of different scribes, officers, and administrators, which ruled out a standard solution: training a specialized model on a single handwriting.

Humphries decided to try GPT-4. The results were rough, but better than any other tool. He and his colleagues spent two years on systematic testing — analyzing letters, legal documents, and diaries from the 18th–19th centuries from different countries.

Research published in May 2025 in the journal Historical Methods showed something striking: LLMs outperformed Transkribus — specialized software used by 150+ major archives and universities. The numbers are impressive. On the same set of documents that the models had never seen before, Transkribus made a reading error rate of 8%.

Humphries' LLM stopped at 2%. Meanwhile, speed increased 50 times, and cost fell 50 times. The company behind Transkribus has already announced it will integrate LLM into its own product.

"This was our dream," Humphries said in an interview.

Archives from a closed book become an open one

The practical consequences are already visible in universities across North America. Lianne Laddie, a historian of Indigenous Histories and co-author of the study, uses AI to search for mentions of Indigenous women of North America in old trade journals, baptismal and marriage records scattered across archives from coast to coast. The problem: these records were written by men (traders, priests, officials), and women's names were often recorded only phonetically, in different ways — French, English, and Scottish writers could spell one name five different ways.

Or a woman was mentioned simply as "someone's wife." To compile a full history at the old pace would have taken decades of work. Now it takes months.

The University of North Carolina (Chapel Hill) is experimenting with AI transcription of its special collections, which are actively used by people searching for information about their ancestors. Archivist Jackie Dean said that models work well with letters and diaries, but the breakthrough came with tables — they have always been a headache for specialized software. The Federal Reserve Bank of Philadelphia went beyond universities altogether.

They use LLMs to extract data from historical property records and car registrations, which were previously too expensive to process at scale. This opened up new possibilities for historical economic research.

From LeCun's numbers to general models

The history of this problem goes back to the very beginning of AI. In the 1980s, Yann LeCun (later a Turing Award winner for his contributions to deep learning) worked on handwritten digit recognition. He was interested not so much in the handwriting itself, but in computer vision — but due to weak computing power and lack of data, he focused on digits, where the postal service and censuses provided information.

It turned out that on the broad dataset that modern LLMs have seen — the internet, books, historical digitizations — models somehow absorbed the connection between handwritten text and its transcription. No one taught them this explicitly. LeCun, who believes the problem is largely solved and moved on to more complex machine intelligence issues long ago, agrees with the logic.

Humphries is now creating Archive Pearl — a nonprofit tool currently in beta. The idea is simple: drag a hundred pages, get a clean transcript in minutes instead of weeks. Humphries' goal is democratization.

It should be a tool for people, not against them.

What this means

Handwritten archives become accessible not only to trained paleographers, but also to students, graduate students, history enthusiasts, and people seeking their roots. Collections that were preserved but functionally hidden behind the labor of transcription become searchable. Questions that were previously too expensive or labor-intensive to ask can now be asked. This is not just an acceleration — it is a transition from the impossible to the routine.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…