Ollama and Open WebUI on a VPS without a GPU: a candid look at the limitations
It's possible to run Ollama with Open WebUI on a VPS without a GPU. You'll need 4+ cores and 8+ GB RAM. Response speed is 1.5 to 2 seconds per token, not millis

Ollama and Open WebUI on VPS without GPU: honest breakdown of limitations
Practical experience shows: running a local LLM on VPS without GPU is possible, but requires an honest assessment of trade-offs.
What You'll Get
Open WebUI is a convenient interface for local models that works without the cloud. Ollama manages model loading and memory. On a simple VPS (2-4 CPU cores, 4-8 GB RAM) you can run smaller models like Mistral 7B or Phi 3, but response speed won't match what you're used to with GPT.
Real Limitations
On CPU, the model will think slower: a single token might be generated in one-and-a-half to two seconds instead of tens of milliseconds on GPU. Suitable for experiments, but for production chat you have to choose between speed and cost. RAM and CPU will be maxed out, and competing tasks will slow down.
Minimum Stack
- VPS minimum 4-6 cores, ideally 8 GB RAM (16 is better)
- Docker and docker-compose for isolation
- Ollama (downloads and caches models)
- Open WebUI (frontend to Ollama)
- Firewall and reverse proxy (Nginx) with Basic Auth are mandatory
Choice Between Local and Cloud
If you run local Ollama, you pay for hardware once, then only for electricity. If you call an API (like OpenAI/Claude), you pay per request, but scaling is painless. For a prototype or experiments, local is cheaper. For a production system, it usually costs more due to CPU idle time.
What This Means
Local LLMs are becoming more accessible, but "just run Ollama" is realistic only if you're ready for speed limitations. For small teams that want to control their data and not pay per request, it works.