Google’s Gemma 4: how to run tool calling locally with Python and Ollama

Q: What is the source?

Originally published on Machine Learning Mastery. Hamidun News processes and adapts the material with AI.

Q: When was it published?

May 2, 2026. Reading time: 3 min.

Machine Learning Mastery broke down a practical Gemma 4 setup: a local agent built with Python and Ollama can call external functions and then generate…

Hamidun News Editorial

AI monitoring · Machine Learning Mastery

May 2, 2026· 2 min

AI-processed from Machine Learning Mastery; edited by Hamidun News

Google’s Gemma 4: how to run tool calling locally with Python and Ollama — Source: Machine Learning Mastery. Collage: Hamidun News.

◐ Listen to article

Machine Learning Mastery has released a detailed breakdown of how to build a local AI agent based on Gemma 4 with tool calling support. The material is important not so much for the code itself, but as a market signal: Google's open-weight models are increasingly confident in scenarios where cloud APIs previously dominated almost exclusively.

About the Article

In an April 14 article, the author demonstrates a practical stack for a local agent: Python, Ollama, and the `gemma4:e2b` model. The idea is straightforward: instead of an ordinary chatbot that only responds from its own weights, the developer gives the model a set of functions and descriptions of their parameters. If a query requires external data, the model doesn't make up an answer but instead formulates a structured call to the necessary tool, receives the result, and only then assembles the final text.

Against the backdrop of the Gemma 4 release, this is particularly striking. Google has released a family of open models under the Apache 2.0 license and emphasized agentic scenarios: structured JSON, function calling, system instructions, and operation on different hardware—from mobile devices to workstations. The company officially promotes Gemma 4 as the foundation for local and on-device tasks, and among supported tools from day one, it lists Ollama. For developers, this means a clearer path to private assistants without mandatory dependence on an external provider.

How the Agent is Structured

The example architecture is built without heavy frameworks. The author intentionally uses standard Python libraries like `urllib` and `json` to demonstrate that a basic agent with tool calling can be set up without LangChain, without orchestrators, and without a thick layer of abstractions. The key part is a tool registry in JSON Schema format. It's what explains to the model which functions are available, what arguments they accept, and which fields are required.

The developer writes local Python functions that serve as tools
For each function, a strict parameter schema is defined
The user query along with the list of tools is sent to Ollama
The model returns `tool_calls` if it needs external data
The application executes the function and sends the result back to the model

After this, a second pass happens. The host application adds the tool response to the message history with the `tool` role, then calls the model again. This is where Gemma 4 no longer guesses but relies on live data. In the example, this allows you to neatly connect a reasoning model and ordinary Python code into a single working cycle without a cloud layer. Essentially, the author shows a minimal version of an agentic runtime that can be broken down and adapted to your own tasks in an evening.

Which Tools Were Demonstrated

As a demonstration, the author first builds a weather function based on Open-Meteo, then adds three more tools: news, current time, and currency conversion. This creates a small but illustrative agent that can answer not just one fact but also a compound query. For example: find out the weather in Paris, the current time, convert Canadian dollars to euros, and simultaneously fetch fresh news on the topic.

Particular emphasis is placed on the `gemma4:e2b` model. This is an edge variant of Gemma 4 with an efficient two-billion-parameter footprint during inference, designed for memory efficiency and low latency. The article highlights that this setup can be run locally, without a GPU and without API limits. For small teams and solo developers, this is an important point: agentic scenarios stop being an expensive experiment and become an ordinary engineering task. The author writes that over a weekend they ran hundreds of requests on the system and saw no failures in the basic tool-calling logic.

What This Means

The main takeaway here isn't another Python tutorial but a shift in the entry threshold. If Gemma 4 truly stably maintains structured output and function calling even in lightweight edge configurations, the market for local agents will expand rapidly: there will be more offline scenarios, private corporate deployments, and fewer reasons to immediately move to expensive cloud stacks.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation