Hugging Face adds DeepInfra to Inference Providers for unified model API
Hugging Face added DeepInfra to Inference Providers on the Hub. DeepSeek, Kimi, and GLM models can now be run directly from model pages, via Python and…
AI-processed from Hugging Face Blog; edited by Hamidun News
Hugging Face has added DeepInfra to the list of Inference Providers on Hub. Now developers can run models available through DeepInfra directly from model pages, through client SDKs and Hugging Face's unified router without separate custom integration.
What was launched
The new integration expands the serverless inference ecosystem within Hugging Face. DeepInfra became a supported provider on Hub, which means its models can be selected right where developers are already looking for datasets, model cards, and ready-made code snippets for running models. In the announcement itself, DeepInfra is described as an AI-inference platform with over 100 models and one of the lowest token prices on the market. For Hugging Face, this is another step toward a model where Hub works not just as a catalog, but as a unified launch point for models.
At launch, the integration covers conversational scenarios and standard text generation. Through DeepInfra on Hugging Face, you can already access popular open-weight models like DeepSeek V4, Kimi-K2.6, and GLM-5.1. At the same time, the team has already outlined the next stage: in the future, text-to-image, text-to-video, embeddings, and other task types should appear through the same layer. In other words, this is not about a one-off integration of one or two LLMs, but about connecting a broader computational channel to Hugging Face infrastructure.
How it works
From the user's perspective, everything is built into the familiar Hub interface. In account settings, you can add your own provider API keys and set preference order, and on Hugging Face model pages, it shows compatible external providers and generates widgets and code examples for them. If a key is not specified, requests can go through Hugging Face itself. If a key is provided, calls are sent directly to DeepInfra. This eliminates unnecessary manual setup and makes switching between providers noticeably easier.
- Your own DeepInfra API key for direct calls without intermediaries
- Routed by HF mode, when a separate provider key is not needed
- Sorting providers by user priority
- The same approach in Hub interface, Python SDK, and JavaScript SDK
- Integration with popular agent harnesses without additional setup
For code, the scheme is also maximally simple. DeepInfra is available through `huggingface_hub` for Python and `@huggingface/inference` for JavaScript, and the examples in the announcement use an OpenAI-compatible client with the base URL `https://router.huggingface.co/v1` and a Hugging Face token. The model is specified in the format `model:provider`, for example for calling DeepSeek through DeepInfra.
It's separately emphasized that the integration already works in a number of agent harnesses, so models can be connected not only in raw code, but also in agent tools on top of the common API.
Pricing and access
With billing, Hugging Face left two clear scenarios. If a developer uses their own DeepInfra key, payment goes to DeepInfra at their rates. If the request is routed through Hugging Face Hub, the charge goes through the Hugging Face account, but without additional markup from the platform: the company says it simply passes through the standard API cost of the provider. For teams, this is an important detail, because the unified router doesn't become another pricing layer on top of already existing infrastructure.
There's also a clear way to test the integration without major expenses. PRO plan users get $2 in inference credits per month, which can be spent with different providers within this system. Free accounts also have a small inference limit, although Hugging Face directly encourages active users to switch to PRO. In practical terms, this lowers the barrier to entry: you can quickly compare DeepInfra with other providers on the same models without building a separate test setup or configuring several different SDKs.
What it means
Hugging Face is increasingly turning Hub into an orchestration layer on top of multiple AI providers, not just a model showcase. For developers, this means less manual integration, faster testing of open-weight LLMs, and an easier path to multi-provider architecture without rewriting client code.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.