AWS showed how to cut text-to-SQL costs for business with Amazon Nova Micro and Bedrock

Q: What is the source?

Originally published on AWS Machine Learning Blog. Hamidun News processes and adapts the material with AI.

Q: When was it published?

May 2, 2026. Reading time: 3 min.

AWS offered a practical recipe for text-to-SQL on enterprise databases: fine-tune Amazon Nova Micro for a company's SQL dialect and run the model through…

Hamidun News Editorial

AI monitoring · AWS Machine Learning Blog

May 2, 2026· 3 min

AI-processed from AWS Machine Learning Blog; edited by Hamidun News

AWS showed how to cut text-to-SQL costs for business with Amazon Nova Micro and Bedrock — Source: AWS Machine Learning Blog. Collage: Hamidun News.

◐ Listen to article

AWS showed how to get text-to-SQL for internal databases without expensive constant model hosting. The company fine-tuned Amazon Nova Micro for non-standard SQL dialects and deployed it via Amazon Bedrock with pay-per-request pricing.

Why this matters

For corporate scenarios, a standard model often isn't enough: it writes standard SQL reasonably well, but starts making errors when a company has its own conventions, rare functions, special table schemas, and domain-specific rules. That's why user text queries need to be adapted to the specific dialect and database structure, which means fine-tuning the model on your own examples. This is especially noticeable in BI systems and internal analytics chats, where a syntax error immediately breaks the entire workflow.

The problem is that fine-tuning usually comes with additional infrastructure costs. If you keep a custom model on dedicated servers, the company pays even when there are no queries. AWS suggests a different approach in its breakdown: fine-tune Amazon Nova Micro via LoRA and run it in Amazon Bedrock in on-demand inference mode, where billing is per token rather than pre-reserved capacity.

Two AWS approaches

AWS describes two scenarios for the same task. The first is managed fine-tuning within Amazon Bedrock. It suits teams that value simplicity, quick start, and minimal ML infrastructure hassle.

Data is loaded into S3, a fine-tuning job is launched via console or API, and AWS handles training and subsequent deployment of the custom Nova Micro version. This approach targets application teams rather than a separate ML platform. The second path is training via Amazon SageMaker AI.

It's more complex but gives more control over the training recipe: you can adjust batch size, dropout, optimizer parameters, context window, LoRA settings, and learning rate warmup strategy. In AWS's example, they used the sql-create-context dataset based on WikiSQL and Spider, converting question-SQL pairs into the bedrock-conversation-2024 format for training and validation. This comes with greater configuration complexity and more explicit infrastructure work.

Bedrock — less operational burden and faster path to a working prototype
SageMaker AI — more control over hyperparameters and MLOps integration
Both schemes use the same data preparation pipeline and then deploy to Bedrock
Final inference runs on a serverless model with per-token billing, no constant hosting

Cost and latency

AWS provides specific numbers. For managed Bedrock fine-tuning, training cost is calculated as $0.001 per 1,000 tokens per epoch: in the example with 2,000 samples, five epochs, and approximately 800 tokens per record, it came to about $8.

For the SageMaker option, an ml.g5.48xlarge instance at $16.

288 per hour was used; training on a 20,000-row dataset took about four hours and cost approximately $65.15. The key thesis of the article is not one-time training cost, but operational cost.

AWS estimated a typical production load of 22,000 requests per month, or 100 users making 10 requests per day over 22 working days. With an average request size of 800 input tokens and 60 output tokens, monthly inference for such a custom text-to-SQL model came to $0.80.

This is possible because a custom Nova Micro in Bedrock is billed the same as the base model, with no additional premium for serverless deployment. In terms of speed, there's a trade-off, but it's moderate. On cold start, average time to first token increased to 639 ms, which is 34% higher than the base model.

In normal operation, average TTFT was 380 ms across 50 calls — only 7% worse than baseline. Full generation latency was around 477 ms, with output speed maintained at 183 tokens per second. AWS validated quality not just by latency, but through LLM-as-a-Judge, comparing generated SQL against reference answers.

What this means

For teams wanting to embed text-to-SQL in analytics products, internal BI tools, or chat interfaces to databases, AWS's case study looks practical: you can get a custom SQL generator without constant dedicated infrastructure costs. If launch speed matters more, Bedrock is the logical choice; if you need full control over training, the SageMaker AI combination looks stronger.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation