AWS Machine Learning Blog→ original

Google DeepMind Gemma 4 появились на Amazon Bedrock: три модели с MoE и мультимодальностью

На Amazon Bedrock появились три модели Gemma 4 от Google DeepMind: Gemma 4 31B, Gemma 4 26B-A4B (MoE) и Gemma 4 E2B. Все распространяются под Apache 2.0 и…

AI-processed from AWS Machine Learning Blog; edited by Hamidun News
Google DeepMind Gemma 4 появились на Amazon Bedrock: три модели с MoE и мультимодальностью
Source: AWS Machine Learning Blog. Collage: Hamidun News.
◐ Listen to article

Amazon Bedrock has added three models from the Gemma 4 family, developed by Google DeepMind, to its catalog — featuring open weights, multimodal input support, and MoE architecture. The models are available via AWS API immediately upon announcement.

Three options for different tasks

Gemma 4 was built with an emphasis on intelligence per parameter — maximum efficiency with minimal computational requirements. The family covers two architectural approaches: dense models and MoE, where only a portion of the neural network is activated per request. Three instruction-tuned variants are available on Amazon Bedrock:

  • Gemma 4 31B — a classic dense model with 31 billion parameters, predictable in behavior and convenient for fine-tuning
  • Gemma 4 26B-A4B — MoE architecture: 26B parameters in the model, but only 4B are activated per request
  • Gemma 4 E2B — a lightweight variant for edge and resource-constrained environments

All three are distributed under the Apache 2.0 license — commercial use without restrictions on volume or request count.

What the models can do out of the box

All Gemma 4 variants support multimodal input: text and images can be passed in a single request. This enables applications in document analysis, visual QA, screenshot processing, and mixed pipelines where different data types need to be processed in a single pass.

Built-in reasoning allows the model to take intermediate steps before providing the final answer. This is especially noticeable on complex mathematical, logical, and multi-step tasks — accuracy improves without additional prompt engineering.

Native function calling provides direct integration with agent systems and external tools. Developers don't need to invent workarounds through output formatting — the model calls functions natively.

Why MoE matters in practice

Mixture-of-Experts is a real way to reduce inference costs. Per request, only a set of specialized "expert" blocks are activated, not the entire neural network. Computational load is like a small model, quality is like a large one. For Gemma 4 26B-A4B this means: despite 26 billion parameters, inference actually works with 4 billion. In high-throughput scenarios where the cost of each token matters, this is a substantial advantage over equivalent dense models.

"The family was designed with a focus on a wide range of deployment scenarios," —

Google DeepMind in describing Gemma 4 architecture.

What this means

Placing Gemma 4 on Amazon Bedrock lowers the barrier to entry for companies in the AWS ecosystem: instead of self-deploying open weights — a ready API with managed infrastructure. Apache 2.0 also doesn't restrict scaling, making the family attractive to product teams who value predictability in licensing conditions.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Need AI working inside your business — not just in your newsfeed?

I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).

What do you think?
Loading comments…