Liquid AI released LFM2.5-230M: 213 tokens/s on Galaxy S25 and support for llama.cpp

Liquid AI has released the smallest model in its open-weights lineup: LFM2.5-230M. It has 230 million parameters, delivers 213 tokens/s on a Galaxy S25 Ultra…

Hamidun News Editorial

AI monitoring · MarkTechPost

Jun 28, 2026· 2 min

AI-processed from MarkTechPost; edited by Hamidun News

Liquid AI released LFM2.5-230M: 213 tokens/s on Galaxy S25 and support for llama.cpp — Source: MarkTechPost. Collage: Hamidun News.

◐ Listen to article

Liquid AI released LFM2.5-230M — the most compact model in the lineup with open weights. With 230 million parameters, it fits on a smartphone or single-board computer and still outperforms competitors with three to four times more parameters on the tasks it was designed for.

What is LFM and what makes it different

LFM stands for Liquid Foundation Model — Liquid AI's proprietary architecture, founded by MIT alumni. The approach fundamentally differs from standard transformers: instead of the classic attention mechanism, it uses a hybrid design inspired by neural differential equations. The result — models that work more efficiently with fewer parameters.

LFM2.5-230M is the smallest in the series, but built on the same foundation as more powerful versions. It doesn't claim to be a universal assistant: the model is optimized for tool use (calling external tools and APIs in agentic pipelines) and data extraction (structured extraction of data from unstructured text). It's precisely on these tasks that it demonstrates results superior to significantly larger competitors.

Speed on real hardware and accuracy on benchmarks

Liquid AI tested performance not on servers, but on consumer devices:

Galaxy S25 Ultra — 213 tokens per second
Raspberry Pi 5 — 42 tokens per second

For context: comfortable reading speed for a user is around 15–25 tokens/s. The model runs on a smartphone with an eightfold margin — sufficient even for interactive real-time applications.

What does this mean practically: LFM2.5-230M can run offline, without API keys, without cloud costs, and without transmitting data to third-party servers. For corporate products with confidentiality requirements, this is a compelling argument in itself.

On instruction following tests, the model outperformed Qwen3.5-0.8B from Alibaba (over three times larger) and Gemma 3 1B from Google (four times larger). This is a win not in overall rankings, but specifically on the tasks the model was designed for.

Supported runtimes

LFM2.5-230M is released with open weights and supports the full standard inference stack:

llama.cpp — CPU execution without GPU on any hardware
MLX — optimized for Apple Silicon chips (M1–M4)
vLLM and SGLang — for server high-load deployment
ONNX — cross-platform standard for production deployment

Maximum coverage: from MacBook to Linux server, from flagship Samsung to an $80 single-board computer. For open models, the breadth of ecosystem support is one of the main factors for real-world adoption.

What this means

Liquid AI clearly demonstrates: architectural efficiency displaces the race for parameters. A model with 230 million parameters that runs on a smartphone faster than a human can read and outperforms four-times-larger analogs is a compelling argument for specialization over universality. For developers of mobile AI applications and agentic pipelines, this opens a new window of possibilities.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation