Negocios

Token Economics (API Pricing)

Token economics is the pricing model governing commercial access to AI language models via API, where users are charged per token — roughly 3–4 characters of text — with separate rates for input (prompt) and output (generated text).

Token economics describes the cost structure that governs commercial access to large language models through APIs. A token is the atomic unit of text that a model processes: approximately 0.75 words, or 3–4 characters, in English. Providers charge separately for input tokens (the prompt and any context sent to the model) and output tokens (the generated response), with output typically priced 2–4× higher because autoregressive generation is more compute-intensive than a single forward pass over the input.

Pricing is quoted per million tokens. OpenAI's GPT-4 launched in March 2023 at roughly $30 per million input tokens; by mid-2025, comparable or more capable models from OpenAI, Anthropic, and Google were available for $1–$15 per million tokens. This compression was driven by competitive pressure and hardware efficiency gains. Additional pricing dimensions include prompt caching — charging a fraction of normal rates for context already stored server-side — batch inference (asynchronous jobs at roughly 50% discount), and tiered volume contracts. Context-window length also affects cost, since longer inputs consume more tokens even when the semantic content is unchanged.

Token economics directly shapes the unit economics of AI-native products. A customer-support chatbot processing thousands of conversations daily can spend vastly different amounts depending on prompt design, model selection, and caching strategy. Practitioners use techniques such as prompt compression, context windowing, and model routing — directing simple queries to cheaper models and complex ones to premium models — to control cost without degrading quality.

As of 2026, the market has segmented further. Frontier models (GPT-5, Claude 4, Gemini 2 Ultra) command premium rates for peak capability, while open-weight models self-hosted on GPU clouds offer near-zero marginal cost per token. Flat-rate subscription plans for individual users coexist with pay-per-token enterprise contracts. Providers increasingly publish separate pricing for long-context use cases and for reasoning-mode inference, where the model generates extended internal chains of thought before producing an answer, consuming additional tokens in the process.

Ejemplo

A developer building a contract-review tool routes short classification queries to a $0.10-per-million-token model and detailed clause-analysis tasks to a $3-per-million-token model, reducing average per-query cost by roughly 80% compared with using the premium model for everything.

Términos relacionados

← Glosario