Habr AI→ original

MiniMax Replaces Claude API: AI Agent Costs Cut from $200 to $20

A social media AI agent developer shared how he cut model costs from $200+ to ~$20 per month—not through prompt engineering, but by switching models. After…

AI-processed from Habr AI; edited by Hamidun News
MiniMax Replaces Claude API: AI Agent Costs Cut from $200 to $20
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

An AI agent developer for social media has demonstrated that the most effective way to reduce model costs is not to squeeze every penny out of prompts, but to reconsider the choice of model itself. In their case, switching from Claude API to cheaper alternatives reduced the monthly model bill from $200+ to approximately $20 without noticeable quality loss.

From Subscription to API

Initially, the agent ran on top of a Claude Max subscription. For the author, this seemed like an almost free scenario: they were already paying around $100 per month for Claude access for everyday development, and the additional load from the agent didn't require a separate budget. The system handled typical content routine work — reading feeds, collecting topics, conducting research, writing drafts, editing them, and preparing publications for social media.

The situation changed after Anthropic updated its policies. Using a subscription for AI agents and automated systems became prohibited, so the project had to be moved to token-based API payments. Theoretically, the rates looked tolerable, but in practice, the model turned out to be too expensive for this scenario. A single morning research session could burn up to 250,000 tokens before producing a finished text, and failed calls and retry requests quickly inflated the total bill. The first full month on the API cost over $200.

The main problem wasn't just the size of the sum, but its unpredictability: on normal days, expenses were moderate, but any edge case with a long reasoning chain multiplied costs many times over. For a solo developer and a side project, this was no longer "payment for convenience," but a separate expense item that needed to be controlled as strictly as servers or external APIs.

Searching for a Cheaper Model

Next came the search through alternatives. The criteria were practical: the model should write long texts, properly handle complex instructions, maintain a stable tone, and cost noticeably less than Claude. The first working option was Kimi K2.5 through OpenRouter at a price of around $0.45 per million tokens. According to the author's assessment, it delivered approximately 80% of Claude's quality for a fraction of the cost and handled post drafts, research summaries, and article outlines reasonably well. The switch to Kimi already helped noticeably: monthly expenses dropped to the range of $40–60. But the unpredictability problem didn't go away, because payment was still based on tokens.

The turning point came after switching to MiniMax M2.5, where the model was offered on a subscription basis for approximately $20 per month. For a content agent, this turned out to be more important than chasing maximum benchmark quality: a fixed payment simplified budget planning and removed the fear of load spikes. In the current setup, the author uses MiniMax as the primary model and keeps Kimi as a backup option. According to him, fallback is almost unnecessary because MiniMax covers over 95% of requests.

The overall economics look like this:

  • MiniMax M2.5 subscription — approximately $20 per month
  • Kimi K2.5 as backup — approximately $1–2
  • TwitterAPI.io for feed collection — $5
  • VPS on Contabo — $6.36

The total cost of the production agent comes out to approximately $33 per month including infrastructure, whereas the Claude API model alone previously consumed $200–400+.

Simple Routing Rules

The author separately emphasizes: subscription is not suitable for everyone. If the load exceeds the limits, unique capabilities of a specific model are needed, or the company is already deeply embedded in its own cloud infrastructure, you'll have to stick with token-based payment. In this case, the main opportunity for cost savings is model routing. The idea is simple: don't send every request to the most expensive engine, but choose the model based on task complexity.

The article lists several approaches right away. Cascading routing first tries the cheapest model and escalates the request higher only if the result is weak. FrugalGPT, which the author references, showed savings of up to 98% while maintaining GPT-4 level accuracy, though the price here is additional latency. RouteLLM from LMSYS demonstrated up to 85% cost reduction on MT Bench while maintaining 95% of GPT-4 performance. And AWS Bedrock offers Intelligent Prompt Routing as a managed service and reports an average of 30% savings, and up to 63% on RAG workloads.

For small teams and solo developers, the author recommends an even more practical variant — three-line rules:

  • short requests up to 500 tokens for formatting or data extraction should be sent to the cheapest model
  • tasks involving code, complex analysis, and deep reasoning should be sent to the flagship model
  • everything in between should be routed to a mid-tier model
"Do you really need an expensive model?" — this is the main question

the author suggests asking before setting up complex routing.

What It Means

The story illustrates well how quickly the economics of AI models is changing. In many cases, teams overpay not because of poor prompts, but because by default they choose a frontier model for all tasks without exception. The practical conclusion is simple: first run your real workload through cheap or subscription-based models, and reserve expensive ones only where quality truly suffers without them.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…