36Kr (36氪)→ original

Zhipu GLM-OCR: How Chinese Tech Giant Taught Micro-Model to See Everything

Китайская Zhipu AI представила и открыла исходный код GLM-OCR — специализированной модели для распознавания текста весом всего 0.9 млрд параметров. Это тот редк

AI-processed from 36Kr (36氪); edited by Hamidun News
Zhipu GLM-OCR: How Chinese Tech Giant Taught Micro-Model to See Everything
Source: 36Kr (36氪). Collage: Hamidun News.
◐ Listen to article

The artificial intelligence industry has long resembled a bodybuilders' off-season competition: each new announcement was accompanied by bragging about the number of billions of parameters and megawatts consumed. But while market leaders measure the size of their clusters, Chinese Zhipu AI, often called the local answer to OpenAI, decided to take the path of elegant minimalism. They released and, more importantly, open-sourced GLM-OCR — a model that proves that quality vision doesn't require a supercomputer the size of a refrigerator.

Context is crucial here. Zhipu AI has long been entrenched at the top of China's tech sector with its GLM lineup, but releasing a model with just 0.9 billion parameters is a direct challenge to the "bigger is better" concept.

Previously, quality text recognition (OCR) required either primitive and inaccurate algorithms or heavyweight multimodal models that consumed video memory for breakfast. Now we see a tool specifically honed for one task, but executing it with surgical precision on the most modest hardware. What exactly changed technologically?

GLM-OCR is natively optimized for modern frameworks like vLLM, SGLang, and Ollama. These aren't just a list of trendy names, but real ability to run the model on a laptop or even an advanced smartphone. Low inference latency and minimal computational overhead make it an ideal candidate for high-load scenarios.

Imagine a document processing system at a bank or logistics company that doesn't need to send each scan to the cloud, wasting seconds waiting and cents per request. Why does this matter right now? We're at an inflection point where business is starting to count money.

Enthusiasm about "universal models that can do everything" is giving way to pragmatic search for tools for specific business processes. Using the massive GPT-4o just to read numbers on a receipt is like using a space rocket for a trip to the bakery. Zhipu gives the market a "bicycle" that will reach the destination faster and cheaper.

Moreover, open source allows companies to fine-tune the model on their specific data while maintaining confidentiality within their own perimeter. Special attention should be given to edge computing support. In the world of Internet of Things and autonomous systems, the ability of a neural network to "see" and understand text without internet access is a critical factor.

This opens doors to a new generation of smart cameras, industrial robots, and wearable devices that understand the context of the surrounding world in real time. Chinese developers once again demonstrate that they are the best at packaging complex technologies into efficient and accessible solutions. Ultimately, GLM-OCR's success could trigger a wave of similar releases from other players.

If a small model handles text recognition at a level sufficient for 90% of commercial tasks, why pay more? This is not just the release of another neural network, it's a manifesto of efficiency against excess. While Western giants build ever higher towers of GPUs, Chinese companies are beginning to dominate in a "guerrilla war" on user devices.

Bottom line: Zhipu AI has made OCR cheap and accessible to everyone. Will 2024 be the year of triumph of micro-models over giants?

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…