Hugging Face adds DeepInfra to Inference Providers for unified model API

Hugging Face added DeepInfra to Inference Providers on the Hub. DeepSeek, Kimi, and GLM models can now be run directly from model pages, via Python and JavaScript SDKs, and through Hugging Face's unified router. Conversational and text-generation scenarios are available at launch, with text-to-image, video, and embeddings coming later. Two billing options are available: via your own DeepInfra key or through a Hugging Face account without markup.

Khamidun Zhemal

AI monitoring · Hugging Face Blog

Apr 30, 2026· 3 min

AI-processed from Hugging Face Blog; edited by Hamidun News

Hugging Face adds DeepInfra to Inference Providers for unified model API — Source: Hugging Face Blog. Collage: Hamidun News.

◐ Listen to article

Hugging Face has added DeepInfra to the list of Inference Providers on Hub. Now developers can run models available through DeepInfra directly from model pages, through client SDKs and Hugging Face's unified router without separate custom integration.

What was launched

The new integration expands the serverless inference ecosystem within Hugging Face. DeepInfra became a supported provider on Hub, which means its models can be selected right where developers are already looking for datasets, model cards, and ready-made code snippets for running models. In the announcement itself, DeepInfra is described as an AI-inference platform with over 100 models and one of the lowest token prices on the market. For Hugging Face, this is another step toward a model where Hub works not just as a catalog, but as a unified launch point for models.

At launch, the integration covers conversational scenarios and standard text generation. Through DeepInfra on Hugging Face, you can already access popular open-weight models like DeepSeek V4, Kimi-K2.6, and GLM-5.1. At the same time, the team has already outlined the next stage: in the future, text-to-image, text-to-video, embeddings, and other task types should appear through the same layer. In other words, this is not about a one-off integration of one or two LLMs, but about connecting a broader computational channel to Hugging Face infrastructure.

How it works

From the user's perspective, everything is built into the familiar Hub interface. In account settings, you can add your own provider API keys and set preference order, and on Hugging Face model pages, it shows compatible external providers and generates widgets and code examples for them. If a key is not specified, requests can go through Hugging Face itself. If a key is provided, calls are sent directly to DeepInfra. This eliminates unnecessary manual setup and makes switching between providers noticeably easier.

Your own DeepInfra API key for direct calls without intermediaries
Routed by HF mode, when a separate provider key is not needed
Sorting providers by user priority
The same approach in Hub interface, Python SDK, and JavaScript SDK
Integration with popular agent harnesses without additional setup

For code, the scheme is also maximally simple. DeepInfra is available through `huggingface_hub` for Python and `@huggingface/inference` for JavaScript, and the examples in the announcement use an OpenAI-compatible client with the base URL `https://router.huggingface.co/v1` and a Hugging Face token. The model is specified in the format `model:provider`, for example for calling DeepSeek through DeepInfra.

It's separately emphasized that the integration already works in a number of agent harnesses, so models can be connected not only in raw code, but also in agent tools on top of the common API.

Pricing and access

With billing, Hugging Face left two clear scenarios. If a developer uses their own DeepInfra key, payment goes to DeepInfra at their rates. If the request is routed through Hugging Face Hub, the charge goes through the Hugging Face account, but without additional markup from the platform: the company says it simply passes through the standard API cost of the provider. For teams, this is an important detail, because the unified router doesn't become another pricing layer on top of already existing infrastructure.

There's also a clear way to test the integration without major expenses. PRO plan users get $2 in inference credits per month, which can be spent with different providers within this system. Free accounts also have a small inference limit, although Hugging Face directly encourages active users to switch to PRO. In practical terms, this lowers the barrier to entry: you can quickly compare DeepInfra with other providers on the same models without building a separate test setup or configuring several different SDKs.

What it means

Hugging Face is increasingly turning Hub into an orchestration layer on top of multiple AI providers, not just a model showcase. For developers, this means less manual integration, faster testing of open-weight LLMs, and an easier path to multi-provider architecture without rewriting client code.

Hamidun News

AI news without noise. Daily editorial selection from 50+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

Book a free consultation →