Habr AI→ original

Hugging Face and ModelAudit: research revealed the limits of built-in ML model protection

A researcher compared Hugging Face's built-in checks with ModelAudit on dangerous and deliberately suspicious ML models. In the first test, the scanner…

AI-processed from Habr AI; edited by Hamidun News
Hugging Face and ModelAudit: research revealed the limits of built-in ML model protection
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

A researcher compared Hugging Face's built-in security checks with the external ModelAudit scanner and obtained a non-obvious result: one tool can detect more risky signals, but in doing so it creates a lot of noise. The main takeaway of the article is that the number of critical alerts by itself tells us almost nothing about how malicious a model actually is.

How the test was organized

In the first experiment, the author took not all of Hugging Face, but a subset of repositories with the most risky model storage formats. The selection included only open models that had files like `.pkl`, `.pickle`, `.dill`, `.pt`, `.pth`, `.ckpt`, `.bin`, `.joblib`, `.npy` and `.npz`. Additionally, very large repositories and very popular models were excluded: the total size was limited to 1 GB, and the number of downloads per month to 10,000. The idea is simple: if you're looking for real problems, it makes sense to first look where the probability of dangerous serialization is higher.

  • In the first set, 246 models were scanned
  • ModelAudit found 271 critical warnings
  • At least one critical alert was triggered by 34 models
  • For comparison, repositories themselves were examined, not individual checkpoints within them

But right from the start it became clear that a large number of findings does not equal detection quality. The model that turned out to be the "richest" in detections was Ultralytics/YOLO11: it received 728 warnings, of which 35 were critical. On paper, this looks like a strong signal of compromise, but manual analysis showed a more mundane picture. A significant portion of the critical flags were tied to standard Python elements that can be found in legitimate models as well. In other words, the scanner correctly noticed potentially dangerous patterns, but too often interpreted them as direct evidence of an attack.

Where the rules create noise

The YOLO11 analysis clearly demonstrated the weak point of static analysis. Some detections came from `pickle_check` due to `__builtin__.getattr`, and some from `pytorch_zip_check` due to `__builtin__.set` and similar indicators. The problem is that `getattr` can indeed be used in malicious chains to bypass primitive rules, but it's also an ordinary Python function that appears in normal code as well. With `set` the situation is even more telling: one internal ModelAudit scanner considers such an import acceptable, while another might flag the entire `builtin` namespace as suspicious. This is why the author specifically emphasizes: a high density of even critical warnings is a reason for manual triage, not a sentence for the model.

During the first experiment, he also analyzed other types of detections, including suspicions of executable signatures within binary files, and again ran into the same problem: rules are often convenient for finding candidates, but work poorly as a final verdict without context, file format, and understanding of the specific framework.

"That's not how I pictured it when we started"

Comparison with Hugging Face

In the second experiment, the author changed focus and compiled a list of models that the repository authors themselves had already marked as malicious, exploit, ace, deserialization, or poc. After additional filtering through an LLM, this set was run through ModelAudit and the results were compared with Hugging Face's built-in statuses. The basic comparison showed quite strong agreement: 154 repositories were considered dangerous by both sides, and 49 were considered safe. However, there were 14 cases where ModelAudit saw a problem while Hugging Face showed nothing suspicious.

The most important nuance here is that some of ModelAudit's useful signals exist not only at the warning and critical levels. The article provides an example of `jossefharush/gpt2-rs`, where an INFO-level alert contained signs of network activity and a link to Pastebin. Further verification showed that this link contained a backdoor that sends the results of command execution on the victim's machine to an attacker. That is, the "informational" message in that particular case turned out to be more substantively important than many loud critical flags from the first experiment.

The author also separately analyzed reverse discrepancies, when Hugging Face signaled danger but ModelAudit let the model pass. Initially, such misses occurred in version 0.2.24, but after updates to 0.2.28, and then to 0.2.31, these cases disappeared. The final picture looked like this: all repositories that Hugging Face ultimately considered dangerous were also caught by ModelAudit, and in addition the external scanner had 17 more repositories with dangerous signals that were not in HF's built-in warnings.

What this means

No single scanner solves the security problem of ML artifacts, even if it appears to be the most mature in its class. The article about Hugging Face and ModelAudit demonstrates a more useful insight: good results come not from betting on one "best" tool, but from a combination of multiple checks, regular rule updates, and mandatory manual analysis of the loudest detections.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…