Hugging Face: Chinese open-source models overtake the US in AI ecosystem downloads
Hugging Face released its spring snapshot of open-source AI, and the main takeaway is simple: the ecosystem has become mainstream, and China has taken the…
AI-processed from Hugging Face Blog; edited by Hamidun News
Hugging Face has published its spring overview of the open-source AI landscape, and according to the data, the open ecosystem has already ceased to be a niche for enthusiasts. Over the past year, it has grown sharply in scale, and the center of gravity is increasingly shifting toward China, independent developers, and more practical models that are actually deployed in production.
The ecosystem has gone mainstream
According to Hugging Face, by 2025 the platform has grown to 13 million users, more than 2 million public models, and over 500 thousand open datasets. The quantity alone is not the only thing that matters. The team notes that users increasingly not only download ready-made models but also release derivative artifacts on top of them: fine-tuning, adapters, benchmarks, and practical applications.
In other words, open source in AI has become not a library for browsing but an environment for active assembly and reuse. At the same time, the ecosystem is distributed very unevenly. Approximately half of the models on Hugging Face have fewer than 200 downloads over their entire lifetime, and almost 49.
6% of all downloads come from just the top 200 most popular models. This clearly shows how the market works: at the top there are a few very prominent families, and below them are thousands of narrow, local, and applied projects.
- 13 million users on the platform
- More than 2 million public models
- Over 500 thousand public datasets
- 49.6% of downloads come from the top 200 models
- Approximately half of the models have fewer than 200 downloads
China and independent developers
The main geographical shift in the report is that China has already surpassed the United States in both monthly and cumulative model downloads. Over the past year, Chinese models accounted for 41% of all downloads on the platform. The number of new repositories and releases from major companies has grown especially rapidly: Baidu went from zero releases on the Hub in 2024 to over 100 in 2025, while ByteDance and Tencent increased their activity eight to nine times.
Following the success of DeepSeek R1, the Chinese ecosystem has clearly made a bet on open weights. Equally important is another shift: the share of industry in overall development has fallen from approximately 70% to 37% when comparing the period before 2022 and 2025. Against this backdrop, independent developers and small teams have grown from 17% to 39% of all downloads, and in certain periods even contributed more than half of the usage.
These players are often the ones making quantization, adaptation, and repackaging of base models for real-world scenarios. In effect, they have become a separate distribution layer between foundation model creators and end users.
Accessible models are winning
The report emphasizes that real demand is increasingly shifting from giant systems to models that are simpler and cheaper to run. Even accounting for the number of releases, models sized 1-9B are downloaded approximately only four times more frequently than systems at 100B+, which is a much smaller gap than could be expected given all the noise around frontier models. Average engagement after release hovers around six weeks, so without constant updates even strong families quickly lose market attention.
In practice, this means that not only the most powerful models but also the most convenient ones for further development win out. Alibaba's Qwen family has already produced over 113 thousand derivative models, and if counting all models with the Qwen tag, there are more than 200 thousand. In parallel, new subcommunities are growing rapidly.
In robotics, the number of datasets grew from 1,145 to 26,991 over the year, making it the largest category on the platform. In scientific tasks, open-source models are increasingly being used for working with proteins, molecules, and research data. All of this is complemented by a shift toward cheaper hardware, quantization, and running models closer to edge infrastructure.
What this means
Open-source AI is entering a phase where victory is determined not only by the quality of the base model but also by the speed of adaptation, the number of derivative builds, and the convenience of local deployment. For companies, this is a signal to look not only at closed frontier systems but also at open ecosystems around Qwen, DeepSeek, Gemma and other families, because that's where practical value is appearing fastest right now.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.