Ollama and LiteLLM: how to turn a Python script into a complete console-based LLM chat

Q: What is the source?

Originally published on Habr AI. Hamidun News processes and adapts the material with AI.

Q: When was it published?

Apr 30, 2026. Reading time: 3 min.

In the second part of the Ollama and LiteLLM guide, a simple Python script is turned into a console-based LLM chat. A conversation loop, a system prompt, and…

Hamidun News Editorial

AI monitoring · Habr AI

Apr 30, 2026· 2 min

AI-processed from Habr AI; edited by Hamidun News

Ollama and LiteLLM: how to turn a Python script into a complete console-based LLM chat — Source: Habr AI. Collage: Hamidun News.

◐ Listen to article

From script to chat

The first step here is not in changing the model, but in changing the usage scenario. A one-time request is good for checking that the Python, Ollama and LiteLLM combination actually works. But as soon as you want to ask a clarification, change the wording or continue a thought, such a script quickly hits the ceiling.

A console chat solves this problem in the most direct way: the program doesn't exit after one response, but remains in the dialogue and allows communication with the model like a normal conversation partner in the terminal. For a developer, this is an important turning point. Instead of "called the API — got text" there is a minimal application interface: user input, message history, model response, and the logic managing them.

LiteLLM is convenient here as a single layer for accessing models, and Ollama covers local execution. As a result, even a small educational project begins to resemble a real product that you can already run, test, and gradually complicate without completely redoing the foundation.

What appears in the code

The next layer is the details that make the chat not just an input/output loop, but a managed program. In such examples, what is particularly valuable is not "magical" features, but basic engineering elements: who sets the model's role, where context is stored, and what happens if the request breaks. It is from these that you get the feeling that you are looking at not a five-minute demo, but a working template for a local assistant.

A conversation loop that accepts new messages until an exit command
System prompt that sets the model's role, tone, and behavioral boundaries
A list of messages so the model sees previous exchanges and maintains context
Basic input validation to avoid sending empty requests
Exception handling so the program doesn't crash after the first failure

Each of these points seems simple, but together they change the quality of interaction. System prompt is needed not only for bot "character": through it you can conveniently set answer rules, format, language, and limitations. Message history allows for a coherent conversation rather than re-explaining the task at each turn. Error handling saves time during debugging: if the local model hangs, Ollama is not running, or LiteLLM returns an exception, the session is not lost entirely.

First steps toward an application

It's particularly important to think about the first steps toward a "live" AI application. A console interface seems modest, but it's the easiest way to check how the bot behaves in real dialogue, where the user formulates thoughts imperfectly, asks clarifications, and constantly changes context. Weak spots quickly become apparent: too general a system prompt, inconvenient output format, lack of commands to exit or restart, unclear errors when loading the model.

Such a framework is easy to extend further without extra architecture. On top of it you can add token streaming, separate commands like /clear to reset history, model switching, dialog logging, or tool integration. But the value of the current step is different: the author shows that a useful interface begins not with a GUI and not with a web application, but with a reliable communication loop in the terminal.

If this layer is done carefully, it's easier to grow further both toward a product and toward experiments.

What this means

For those building local AI tools on Python, this stage is mandatory. The Ollama and LiteLLM combination becomes interesting not at the moment of the first successful response, but when a normal communication loop appears around the model. A console chat is the minimal form of such a loop: simple enough to start and useful enough to build the next layer of functionality on top of it.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation