Habr AI→ original

Ollama and Open WebUI on a VPS without a GPU: a candid look at the limitations

It's possible to run Ollama with Open WebUI on a VPS without a GPU. You'll need 4+ cores and 8+ GB RAM. Response speed is 1.5 to 2 seconds per token, not millis

Ollama and Open WebUI on a VPS without a GPU: a candid look at the limitations
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Ollama and Open WebUI on VPS without GPU: honest breakdown of limitations

Practical experience shows: running a local LLM on VPS without GPU is possible, but requires an honest assessment of trade-offs.

What You'll Get

Open WebUI is a convenient interface for local models that works without the cloud. Ollama manages model loading and memory. On a simple VPS (2-4 CPU cores, 4-8 GB RAM) you can run smaller models like Mistral 7B or Phi 3, but response speed won't match what you're used to with GPT.

Real Limitations

On CPU, the model will think slower: a single token might be generated in one-and-a-half to two seconds instead of tens of milliseconds on GPU. Suitable for experiments, but for production chat you have to choose between speed and cost. RAM and CPU will be maxed out, and competing tasks will slow down.

Minimum Stack

  • VPS minimum 4-6 cores, ideally 8 GB RAM (16 is better)
  • Docker and docker-compose for isolation
  • Ollama (downloads and caches models)
  • Open WebUI (frontend to Ollama)
  • Firewall and reverse proxy (Nginx) with Basic Auth are mandatory

Choice Between Local and Cloud

If you run local Ollama, you pay for hardware once, then only for electricity. If you call an API (like OpenAI/Claude), you pay per request, but scaling is painless. For a prototype or experiments, local is cheaper. For a production system, it usually costs more due to CPU idle time.

What This Means

Local LLMs are becoming more accessible, but "just run Ollama" is realistic only if you're ready for speed limitations. For small teams that want to control their data and not pay per request, it works.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…