5 small open models with tool calling: agents that don't need the cloud

Q: Источник материала?

Оригинальная публикация на KDnuggets. Hamidun News обрабатывает и адаптирует материалы с помощью AI.

Q: Когда опубликовано?

2026-05-16. Время чтения: 4 мин.

Small language models have gained the ability to invoke functions and use tools — a key step toward decentralized AI agents. Instead of cloud services, compact

Hamidun News Editorial

AI monitoring · KDnuggets

2026-05-16· 4 min

5 small open models with tool calling: agents that don't need the cloud — Source: KDnuggets. Collage: Hamidun News.

◐ Listen to article

Small language models have long struggled to compete with cloud services in one key capability—managing tools through tool calling. Now this is changing. A new generation of compact, open models has emerged that not only support structured function calls but also remain lightweight enough for local deployment.

What is Tool Calling and Why It Works

Tool calling is the model's ability to invoke external functions, scripts, or APIs directly, rather than simply writing code as a response. The model sees a list of available functions with descriptions, their parameters and data types, and independently decides which function to call and with what arguments.

This is critical for AI agents: they can manage databases, download files, send emails, schedule meetings—all without direct human intervention.

Structured output (JSON-shaped responses) is a guarantee that the model will return results in the correct, predictable format that a program can parse and use.

Until recently, only large models (GPT-4, Claude 3) could do this reliably. Now small models have also learned to output structured JSON without glitches.

Why Small Models Are Now Competitive

Small models (7B-13B parameters) have several advantages over large ones. They are cheaper to develop and run inference on, more private by default (do not send data to the cloud), and faster to respond.

They do not require cloud services and powerful corporate hardware—a mid-range GPU or even a decent CPU is sufficient. Add tool calling support to such a small model and you get a fully functional AI agent that can run on your own server, laptop, or even a smartphone without internet.

This opens the way for corporate private agents with data confidentiality guarantees. A company can run an agent within its own secure network without sending a single request to the cloud.

Plus there is licensing flexibility: all these models are open source and can be used for commercial purposes without requesting permission.

5 Models Ready to Use

Here are five small models that already support full tool calling today:

Llama 3.1 (Meta) — base 8B version with good documentation and tool calling examples; the most tested and stable in the list
Mistral 7B — compact, very fast, good quality-to-size ratio; popular in enterprise environments
PhiLM 3 (Microsoft) — optimized specifically for structured output and engineering tasks; minimal memory requirements
OpenChat 3.5 — focused on functions and tool management; strong benchmarks in tool-calling tests
Neural Hermes 2.5 (finetuned Mistral) — handles complex multi-step call chains and error recovery best of all

All five can be downloaded from Hugging Face in minutes and run locally without internet. Inference time (response to a query) ranges from 50 to 200 milliseconds on modern GPUs or fast CPUs.

What This Means for the Industry

The era of cloud monopoly on AI agents has ended. Now even small startups and corporations can build private, fully functional AI agents that work no slower and no less intelligently than cloud alternatives like OpenAI API or Claude via cloud.

This means all AI infrastructure is gradually moving from the cloud to on-premise. In the coming months, we expect a surge in tools and frameworks for local agent deployment (like LM Studio, Ollama, but with proper tool calling support).

For developers, this opens an entirely new market: private AI agents for large corporations, government agencies, healthcare, and fintech. Anywhere cloud use is banned for political or legal reasons, local models are the only way.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com