IBM released Mellea 0.4.0 and Granite Libraries for verifiable AI pipelines
IBM Research updated Mellea to version 0.4.0 and simultaneously released three Granite Libraries — specialized LoRA adapters for Granite models. The release…
AI-processed from Hugging Face Blog; edited by Hamidun News
IBM Research has released Mellea 0.4.0 and simultaneously introduced three Granite Libraries — sets of specialized adapters for Granite models. The release targets those building not just chatbots, but managed AI pipelines with structure verification, fact-checking, and rule compliance.
What was updated
Mellea is an open-source Python library for "generative programs": instead of fragile prompting, it offers building LLM workflows from predictable steps. IBM positions it as an alternative to general-purpose orchestrators, where model behavior often remains probabilistic and poorly reproducible. The framework is needed where model output becomes part of a business process, report, form, or chain of agent actions. Version 0.4.0 develops ideas from release 0.3.0 and expands the toolkit of building blocks for such scenarios.
The new version introduces a native integration layer with Granite Libraries through a standardized API. The key focus is constrained decoding, so answers conform to a given schema rather than just "looking like" correct JSON. Another important pattern is instruct-validate-repair: the system first generates an answer, then validates it, and if needed, launches a correction. According to release notes, Mellea also received hooks and plugin support, log export via OTLP, metrics for Prometheus and OpenTelemetry, as well as token consumption tracking and pipeline-level events.
What went into Granite Libraries
Granite Libraries are not another universal large model, but a set of LoRA adapters for granite-4.0-micro. Each adapter is trained on a narrow operation within the chain: rewrite a query, verify requirement fulfillment, assess factuality, add citations, or track policy violations. This approach allows strengthening individual pipeline stages without full retraining of the base model. IBM particularly emphasizes that this provides accuracy gains at moderate parameter cost and without breaking Granite's core capabilities.
- granitelib-core-r1.0 — adapters for requirement verification, answer confidence assessment, and explainability through context attribution.
- granitelib-rag-r1.0 — tools for agentic RAG: query rewrite, query clarification, context relevance verification, answerability assessment, hallucination detection, and citation generation.
- granitelib-guardian-r1.0 — modules for safety, factuality, and policy compliance, including factuality correction and separate guardrails.
- All libraries work on top of granite-4.0-micro, and the RAG set is published as a compact package at approximately 14.4 million parameters.
In practice, this means a developer doesn't need to force one model to do everything equally well. Instead, Mellea orchestrates specialized "plugs" at the right places: before retrieval, before generation, after generation, and at final verification. For enterprise scenarios, this is especially useful where you need to provably follow a schema, not answer an unsolvable question, return citations for every claim, or show what context fragments the system actually relied on.
Why the release matters
The main idea of the release is a shift from "smart model with a big prompt" to modular architecture, where quality is controlled at each step. This aligns well with real product tasks: internal copilots, RAG search across documents, assistants with tool calling, compliance checks, and any scenarios where an error should not just be noticed by the user but caught by the system automatically. For audited industries like finance, medicine, or corporate document management, this approach is especially pragmatic.
Observability stands out separately. If an LLM stack has callbacks, telemetry, OpenTelemetry metrics, and export to Prometheus, a team can already manage it like an ordinary production service: see where tokens are lost, at which stage validation fails, which adapters most often trigger a repair cycle. This simplifies not just debugging, but operation: AI functions start to look like a measurable service, not a black box with good demos. For teams translating pilots to production, this is often more important than the next benchmark improvement.
Another strong move by IBM is betting on specialized adapters instead of inflating the base model for every task. Core has uncertainty assessment and requirement checking, RAG has query handling, relevance, and citation, Guardian has safety, factuality, and policy compliance checks. Together, this turns Granite not just into a model, but into a set of applied primitives for building controlled AI systems. In fact, part of manual QA and prompt tuning is moved here into separate, verifiable components.
What it means
IBM is betting not on "yet another chat," but on infrastructure for verifiable AI processes. If the Mellea and Granite Libraries approach takes hold, the market will move faster from manual prompt engineering to a more engineering-oriented, modular, and auditable way of building LLM products, where what matters is not just answer quality, but also the ability to explain, verify, and if needed, automatically correct it before delivering to the user.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.