MarkTechPost→ original

Google releases Gemini-SQL2: Gemini 3.1 Pro scores 80% on BIRD benchmark

Google Research announced Gemini-SQL2 — a text-to-SQL conversion system based on Gemini 3.1 Pro. On the BIRD benchmark in the single-model category, the…

AI-processed from MarkTechPost; edited by Hamidun News
Google releases Gemini-SQL2: Gemini 3.1 Pro scores 80% on BIRD benchmark
Source: MarkTechPost. Collage: Hamidun News.
◐ Listen to article

Google Research has introduced Gemini-SQL2 — a system for converting text queries into SQL based on Gemini 3.1 Pro, which achieved 80.04% execution accuracy on the BIRD benchmark in the single-model category.

What is the BIRD Benchmark

BIRD (Big Bench for Large-scale Database Grounded Text-to-SQL) is a standard academic test for evaluating systems that translate natural language questions into SQL queries. Unlike earlier datasets such as Spider, BIRD works with real, "dirty" data: tables contain typos, non-standard date formats, NULL values, and abbreviations without explanations. This is why the benchmark is considered more representative of industrial tasks.

The execution accuracy metric shows in what percentage of cases the generated SQL produced the correct answer when executed on a test database. A result of 80.04% is among the highest public scores in the single-model category: without ensemble methods combining multiple models, special post-processing pipelines, or additional verification agents. Previous leaders in this category maintained scores in the 73–77% range.

How Gemini-SQL2 Works

According to Google Research's description, Gemini-SQL2 uses a schema-grounded approach. The model receives the complete database structure — table names, column types, foreign keys, and example values — and constructs SQL taking into account the real architecture of the specific database. This reduces typical errors: hallucinated field names, incorrect joins, and faulty aggregation. This is particularly important when working with corporate databases, where column names often represent non-obvious abbreviations or technical codes.

Typical use cases include:

  • analytics without SQL specialists — a business user asks a question in natural language and receives a ready-to-use query
  • BI interfaces over corporate data warehouses with voice or text input
  • autocompletion and generation of complex queries for developers based on a text description of the task
  • rapid prototyping of samples for exploratory data analysis
  • automatic creation of SQL for regular business reports

For practical implementation, Google suggests a pattern: first pass the model the DDL schema and a few sample rows from each table, then the user's question. This way the model sees the real database structure and doesn't generate a query blindly.

What Google Didn't Disclose

The publication contains several important gaps. Google did not publish details of the architecture, the methodology of fine-tuning, and the composition of the training data. It remains unclear whether Gemini-SQL2 is an independently fine-tuned model or a special prompting strategy on top of the base Gemini 3.1 Pro. It is also unclear whether the system is available through the API right now or if this is still a research experiment result without immediate product release. There is no information about support for languages other than English and compatibility with SQL dialects that include window functions and recursive CTEs.

"80 percent on BIRD is a serious result, but without a technical

report it's hard to understand whether it's reproducible for arbitrary corporate databases" — a typical reaction from the ML community to such announcements.

What This Means

The 80% threshold on BIRD is a signal that text-to-SQL is ceasing to be an academic task and is becoming a practically applicable tool for most standard business queries. Companies that want to give non-technical employees direct access to data have solid grounds for pilots with LLM-powered analytics. The coming months will show whether Google will translate this result into a concrete product — for example, a built-in BigQuery feature — and whether competitors will follow with comparable public benchmarks.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…