Habr AI→ original

Who is stricter with JSON: OpenAI, Gemini, and xAI tested for schema compliance

Habr developers ran adversarial tests on Structured Outputs across OpenAI, Gemini, and xAI to check how faithfully they comply with JSON Schema. They asked the

Who is stricter with JSON: OpenAI, Gemini, and xAI tested for schema compliance
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Developers at Habr conducted a study that tested how Structured Outputs actually work across three leading LLM providers: OpenAI, Gemini, and xAI. Instead of reading documentation, they used an adversarial testing method—intentionally breaking schemas to see where each provider holds the line and where it gives way.

How They Tested Schema Compliance

The authors didn't simply ask models to return valid JSON. They deliberately constructed prompts trying to force the model to violate its own JSON Schema. They asked it to place values of the wrong type in fields, add extra keys, use invalid enum values, violate minimum or maximum string length, exceed numeric ranges. They watched where the model obeyed strict: true and where it broke through and returned invalid JSON. They separately tested complex constructions: anyOf, oneOf, allOf—combinatorial rules that are hardest to implement in LLMs. In practice, many systems simplify, ignore, or half-implement these constructions. The authors checked how each provider interprets these rules.

What They Found About Different Providers

The results are surprising: each provider has its own set of hard and soft constraints. OpenAI maintains strictness best, especially when strict: true is enabled. Gemini allows more leeway with enum and nested objects. xAI shows stability at medium complexity but has blind spots on allOf constructions and conditional fields.

The constraint matrix from the article shows:

  • OpenAI: reliable on primitive types (string, number, boolean), sometimes slips on complex nested structures
  • Gemini: interprets anyOf and oneOf more freely, can add extra keys to objects, less strict with enum values
  • xAI: good at medium complexity and basic rules, weaker on multi-level validation and patternProperties

Even strict: true doesn't guarantee 100% compliance. This shifts responsibility back to the developer.

Practical Conclusions for Production

If you're building a system that calls multiple LLM providers in parallel, these differences are critical. You can't simply switch from OpenAI to Gemini and expect exactly the same formatted response. JSON Schema stays the same, but model behavior is radically different.

For production scenarios, the authors recommend: always add response validation after the model, even if the provider promises strict: true. Use a jsonschema library that will re-verify the result independently. And add fallback logic in case the format unexpectedly breaks: re-ask the model, use a default value, or return an error to the user. One random enum deviation from the model can kill your entire pipeline.

What It Means

Structured Outputs is a useful feature, but not a panacea. Habr's testing showed that trusting the provider completely is a mistake. If you care about response format, invest in client-side validation.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.
What do you think?
Loading comments…