Latest publications

AllenAI Releases olmo-eval — A Platform for Evaluating LLMs During Training
AllenAI released olmo-eval, an open toolkit for continuous evaluation of language models throughout the training cycle — checkpoint by checkpoint.

Cohere Presents North Mini Code — Model for Developers and AI Agents
Cohere released North Mini Code — a 30-billion-parameter model specifically trained on programming and AI agent work. The model is free and available to everyone.

Voice agents not ready for bilingual customers. ServiceNow-AI research
Voice agents perform poorly with bilingual clients. This was shown by research from the ServiceNow-AI team, which tested seven popular speech recognition systems on examples of code-switching — when…

How to Speed Up PyTorch Models: A Practical Guide to torch.profiler
Hugging Face explained torch.profiler, a built-in PyTorch tool for analyzing performance. It helps identify bottlenecks in model training and inference.

Hugging Face Enables TRL to Deliver Trillion Parameters Through Delta Weights
Hugging Face added Delta Weight Sync to TRL — a technique that sends only weight changes instead of full files, reducing data volume by hundreds of times when training giant models.

Reachy Mini Learns to Speak Locally Without the Cloud
The Reachy Mini humanoid robot can now run a full speech recognition stack locally without cloud or API, thanks to open models from Hugging Face.

IBM and Artificial Analysis create benchmark: AI agents fail at IT tasks
Large language models scored less than 50% on the new ITBench-AA benchmark for assessing AI agents' ability to solve corporate IT tasks. This shows that full automation of IT work remains a distant future.

NVIDIA Nemotron: Diffusion Models Generate Text 6x Faster
NVIDIA introduced Nemotron-Labs Diffusion — the first language models that generate text in parallel instead of sequentially. In speculative mode, they run 6× faster than conventional models thanks to the diffusion appro

How a Small Model Beat GPT-5 and Claude Opus at Portuguese OCR
Dharma AI trained a 3-billion-parameter specialized model that outperformed all commercial frontier models in text recognition tests and beat them in price by 52 times.

Hugging Face launched Open Agent Leaderboard to evaluate AI agents
Hugging Face introduced an open benchmark for comparing full AI agent systems. It found that agent architecture matters more than the model chosen.

PaddleOCR 3.5 Gains Support for Hugging Face Transformers
PaddleOCR has been updated with full support for Hugging Face Transformers as an inference backend. Text recognition and document parsing now work in a PyTorch environment.

NVIDIA Shows Efficient Method to Train Cosmos for Robot Video Generation Using LoRA
NVIDIA released a guide for fine-tuning Cosmos Predict 2.5 using LoRA/DoRA—a parameter-efficient adaptation method that enables robot video generation training in 17 hours on a single GPU.

Ettin Reranker from Hugging Face: 6 Models for Precise Search Reranking
Hugging Face released 6 Ettin rerankers based on ModernBERT with state-of-the-art accuracy and speed thanks to Flash Attention 2 and sequence optimization.

OlmoEarth v1.1: Allen AI Releases Satellite Models 3 Times Cheaper
Allen AI presented a more efficient version of models for analyzing satellite imagery, reducing computational costs by 3 times while maintaining quality.

How Allen AI's model learned to discover expert specialization on its own
Allen AI introduced EMO, a mixture-of-experts model that naturally develops domain specialization (health, politics, film) without explicit training on those categories.

CyberSecQwen-4B: how a small model became a vulnerability expert
The specialized 4-billion-parameter cybersecurity model outperforms general-purpose competitors in vulnerability analysis and runs locally on personal hardware without cloud services.

OncoAgent: AI system for early cancer detection based on private patient data
How a machine learning algorithm helps doctors make decisions on cancer diagnosis without compromising patient confidentiality

Hugging Face sped up LLM inference by 22% with asynchronous batching
Parallel CPU and GPU processing instead of sequential processing cut GPU idle time by 24% and sped up token generation by nearly a quarter without changing the model.

IBM released Granite Embedding R2 — a multilingual model for semantic search
IBM introduced Granite Embedding R2, an open multilingual model for semantic search with 32K context support and best-in-class performance among sub-100M models.

H Company released Holotron-12B — a model for agents with a 2x speed increase
H Company published Holotron-12B on Hugging Face: the multimodal model for AI agents delivers more than a 2x throughput gain in interface-use tasks on a single H100.

NVIDIA introduced SPEED-Bench — a unified benchmark for speculative decoding
NVIDIA published SPEED-Bench, a dataset and measurement framework that compares speculative decoding across real-world workloads, long contexts, and different inference engines.

IBM released Mellea 0.4.0 and Granite Libraries for verifiable AI pipelines
IBM Research updated the open-source Mellea framework to version 0.4.0 and released three Granite Libraries for structured, verifiable, and safe AI workflows.

NVIDIA showed how to fine-tune an embedding model for a specific domain in a day
NVIDIA and Hugging Face published a step-by-step recipe that turns a base embedding model into specialized search over internal documents in a few hours.

ServiceNow introduced EVA — a new framework for evaluating voice AI agents
ServiceNow released EVA — a system that evaluates voice AI agents not only by task success, but also by dialogue quality, from response brevity to turn timing.

IBM releases Granite 4.0 3B Vision for extracting data from documents and charts
IBM introduced Granite 4.0 3B Vision, a compact multimodal model for extracting tables, charts, and key fields from documents that can be integrated into enterprise pipelines with Docling.

H Company introduces Holo3 — an AI agent for computer use with a record score on OSWorld-Verified
H Company has released Holo3, a model for computer use that scored 78.85% on OSWorld-Verified and was trained on synthetic enterprise scenarios.

Google released Gemma 4 on Hugging Face: multimodal models for local inference
Google DeepMind has opened the Gemma 4 family on Hugging Face: four multimodal models under the Apache 2.0 license, with up to 256K context and deployment ranging from phones to workstations.

Hugging Face added gradio.Server: custom frontends can now connect to a Gradio backend
Hugging Face’s new gradio.Server turns Gradio into a backend layer for React, Svelte, and plain HTML/JS, while preserving request queues, ZeroGPU, and compatibility with Spaces.

Hugging Face transfers Safetensors to the PyTorch Foundation for neutral governance of the format
Hugging Face announced that Safetensors has become a PyTorch Foundation project: there are no breaking changes for users, while the format's development moves to a neutral governance model.

Overworld released Waypoint-1.5: 720p interactive worlds for consumer GPUs
Overworld released Waypoint-1.5, a world model for running locally on consumer GPUs: up to 720p and 60 FPS, plus a lighter 360p version for a wider range of PCs and laptops.