Google Unveiled Gemini Omni — Multimodal Video Editor

Google unveiled Gemini Omni — a model for working with photos, video, and audio. It creates new scenes based on uploaded content, allowing users to iterate over ideas. Currently it works with video output, but the company plans to expand to all formats.

Khamidun Zhemal

AI monitoring · @demishassabis

May 26, 2026· 3 min·updated Jul 12, 2026

AI-processed from @demishassabis; edited by Hamidun News

Google Unveiled Gemini Omni — Multimodal Video Editor — Source: @demishassabis. Collage: Hamidun News.

◐ Listen to article

Google unveiled Gemini Omni — a next-generation model that makes a significant leap in understanding and editing multimodal content. Unlike its predecessors, Omni natively works with photographs, video, and audio simultaneously, creating new scenes from uploaded material.

What Gemini Omni Can Do

The main distinguishing feature of Omni is that it works with multiple types of content simultaneously. A user can upload video, add a photo or audio recording — the model will understand and transform the material into a new scene. Demis Hassabis, CEO of DeepMind, called this a "significant leap in understanding the world and multimodal editing."

At the current stage, the primary output is video. However, Google plans to expand its capabilities: over time, the system will be able to generate and edit content in any format — text, audio, images, 3D models. This differs from current tools, which specialize in a single type of content.

How Editing Works

The process doesn't involve creating from scratch. Instead, a user uploads their own material — video, photo, audio — and Omni transforms it into a new version. This could be changing lighting, adding new objects to a scene, rearranging people, or transforming the atmosphere of a frame. The system understands context and preserves the meaning of the original content, while allowing iteration over ideas.

Upload video in any format and quality
Change scene elements through text descriptions
Add new objects and characters to a frame
Iterative improvement through multiple editing cycles
Support for multimodal prompts (text, photo, audio)

Industrial Applications

For content creators, this radically simplifies the workflow. Instead of using separate tools for video, audio, and images, users can work within a unified ecosystem. This is especially important for independent creators with limited software budgets.

In the professional film industry, Gemini Omni can accelerate post-production. Editors will be able to quickly generate scene variations, and directors can experiment with different versions of a shot without reshooting.

For marketing and advertising, this means faster adaptation of content for different platforms and audiences.

What This Means

The emergence of truly multimodal systems represents a shift from narrowly specialized AI tools to universal assistants. Google is moving toward a model that sees, hears, and understands the world the way humans do, and can recreate or edit that world on the fly. This is an intermediate stage on the path to more general AI capable of working with any type of information simultaneously.

Hamidun News

AI news without noise. Daily editorial selection from 50+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation