Google DeepMind Gemma 4 появились на Amazon Bedrock: три модели с MoE и мультимодальностью
На Amazon Bedrock появились три модели Gemma 4 от Google DeepMind: Gemma 4 31B, Gemma 4 26B-A4B (MoE) и Gemma 4 E2B. Все распространяются под Apache 2.0 и…
AI-processed from AWS Machine Learning Blog; edited by Hamidun News
Amazon Bedrock has added three models from the Gemma 4 family, developed by Google DeepMind, to its catalog — featuring open weights, multimodal input support, and MoE architecture. The models are available via AWS API immediately upon announcement.
Three options for different tasks
Gemma 4 was built with an emphasis on intelligence per parameter — maximum efficiency with minimal computational requirements. The family covers two architectural approaches: dense models and MoE, where only a portion of the neural network is activated per request. Three instruction-tuned variants are available on Amazon Bedrock:
- Gemma 4 31B — a classic dense model with 31 billion parameters, predictable in behavior and convenient for fine-tuning
- Gemma 4 26B-A4B — MoE architecture: 26B parameters in the model, but only 4B are activated per request
- Gemma 4 E2B — a lightweight variant for edge and resource-constrained environments
All three are distributed under the Apache 2.0 license — commercial use without restrictions on volume or request count.
What the models can do out of the box
All Gemma 4 variants support multimodal input: text and images can be passed in a single request. This enables applications in document analysis, visual QA, screenshot processing, and mixed pipelines where different data types need to be processed in a single pass.
Built-in reasoning allows the model to take intermediate steps before providing the final answer. This is especially noticeable on complex mathematical, logical, and multi-step tasks — accuracy improves without additional prompt engineering.
Native function calling provides direct integration with agent systems and external tools. Developers don't need to invent workarounds through output formatting — the model calls functions natively.
Why MoE matters in practice
Mixture-of-Experts is a real way to reduce inference costs. Per request, only a set of specialized "expert" blocks are activated, not the entire neural network. Computational load is like a small model, quality is like a large one. For Gemma 4 26B-A4B this means: despite 26 billion parameters, inference actually works with 4 billion. In high-throughput scenarios where the cost of each token matters, this is a substantial advantage over equivalent dense models.
"The family was designed with a focus on a wide range of deployment scenarios," —
Google DeepMind in describing Gemma 4 architecture.
What this means
Placing Gemma 4 on Amazon Bedrock lowers the barrier to entry for companies in the AWS ecosystem: instead of self-deploying open weights — a ready API with managed infrastructure. Apache 2.0 also doesn't restrict scaling, making the family attractive to product teams who value predictability in licensing conditions.
Need AI working inside your business — not just in your newsfeed?
I build production AI for companies — custom CRM, internal tools, autonomous agents, workflow automation. Owned by you, shaped to your process, no per-seat tax. Built by Zhemal Khamidun, CPO of AlpinaGPT (AI platform, 6,000+ users).
The AI world, distilled — once a week
Seven stories that actually mattered, hand-picked. No noise, no reposts, no press releases.
Done! Check your inbox for a confirmation.