How to Build a Streaming Decision-Making Agent with Online Replanning in a Dynamic Environment

The tutorial explains how to build a streaming decision-making agent that adapts to a changing environment in real time. The agent operates on a grid with…

Hamidun News Editorial

AI monitoring · MarkTechPost

May 3, 2026· 2 min

AI-processed from MarkTechPost; edited by Hamidun News

How to Build a Streaming Decision-Making Agent with Online Replanning in a Dynamic Environment — Source: MarkTechPost. Collage: Hamidun News.

◐ Listen to article

This tutorial describes the architecture of a streaming decision-making agent that operates in a constantly changing environment and streams partial reasoning in real time — without waiting for a final answer before acting.

Environment and Task

A dynamic grid is used for the demonstration: obstacles in it move according to their own rules, and the target point shifts randomly at a fixed interval.

The agent does not know in advance what exactly will change at the next step — this is the key difference from classical pathfinding tasks.

Key environment parameters:

An N×N grid with moving obstacles
The target shifts randomly every K steps
The agent sees only a limited observation radius around itself
The environment is non-deterministic — the same plan can fail twice

This problem setup is intentionally made more difficult.

It models real scenarios: navigation of an autonomous warehouse robot, route planning for a self-driving car in traffic, and control of a production line under equipment failures.

Receding-Horizon Planner

At the core of the agent is the A* algorithm, but used in a non-standard way.

Instead of a full route to the goal, a receding horizon is used: the agent plans only the next H steps, executes several of them, and then replans from its new position using the updated state of the environment.

This fundamentally changes the logic of operation.

A full plan in a dynamic environment becomes outdated faster than the agent can execute it: an obstacle moved, the goal shifted — and the route is already no longer relevant.

A short horizon makes it possible not to cling to stale data.

“The agent does not keep one big plan — it continuously creates and

discards small plans as it moves forward,” which captures the essence of the receding-horizon approach.

Parameter H (the horizon length) becomes a key tuning element: too short, and the agent moves myopically and gets stuck in local minima; too long, and it spends time planning routes that will have to be discarded anyway.

Streaming Partial Reasoning

A standard agent stays silent until it finds a final answer.

A streaming agent emits intermediate states in real time — every meaningful reasoning step becomes available immediately:

A new obstacle is detected → the signal is sent immediately
The target shifted → the old plan is discarded, a new one is started
An intermediate path is found → it is streamed even if it is not yet optimal
The horizon point is reached and replanning is launched → the status is updated

This provides observability: the orchestrator system or the user always knows the agent’s current intention.

In production systems, this makes it possible to intervene before the agent has reached a dead end.

Another effect is that an external system can correct behavior on the fly: if the streamed plan is heading in an undesirable direction, an interrupt signal can be sent immediately.

Technically, streaming is implemented through Python generators: each `yield` emits a reasoning step, which is compatible with the streaming API of modern LLM.

Reactive Adaptation

The third component is interrupting the current plan when the environment changes right in the middle of executing a step.

The agent does not wait for the next replanning cycle: the interrupt mechanism checks the state of the environment after every action and, if necessary, launches emergency replanning.

A criticality scale for changes is introduced: a slight obstacle shift means continue the current plan; blocking the next step means immediate replanning; a complete change in the target position means restarting with a new horizon.

This multi-level reaction increases computational load, but it is critically important where the cost of error is high.

What This Means

The described architecture is a practical template for developers of AI agents operating under real uncertainty.

Streaming reasoning, a short planning horizon, and reactive interrupts are three patterns that together provide a ready-made framework for robotics, industrial automation, and LLM-based agent systems.

As agent systems spread through industry, the gap between “thinks in a vacuum” and “acts in the real world” is becoming a key engineering challenge — this tutorial provides a concrete entry point.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

Book a free consultation →