Google Unveiled Gemini Omni — Multimodal Video Editor
Google unveiled Gemini Omni — a model for working with photos, video, and audio. It creates new scenes based on uploaded content, allowing users to iterate…
AI-processed from @demishassabis; edited by Hamidun News
Google unveiled Gemini Omni — a next-generation model that makes a significant leap in understanding and editing multimodal content. Unlike its predecessors, Omni natively works with photographs, video, and audio simultaneously, creating new scenes from uploaded material.
What Gemini Omni Can Do
The main distinguishing feature of Omni is that it works with multiple types of content simultaneously. A user can upload video, add a photo or audio recording — the model will understand and transform the material into a new scene. Demis Hassabis, CEO of DeepMind, called this a "significant leap in understanding the world and multimodal editing."
At the current stage, the primary output is video. However, Google plans to expand its capabilities: over time, the system will be able to generate and edit content in any format — text, audio, images, 3D models. This differs from current tools, which specialize in a single type of content.
How Editing Works
The process doesn't involve creating from scratch. Instead, a user uploads their own material — video, photo, audio — and Omni transforms it into a new version. This could be changing lighting, adding new objects to a scene, rearranging people, or transforming the atmosphere of a frame. The system understands context and preserves the meaning of the original content, while allowing iteration over ideas.
- Upload video in any format and quality
- Change scene elements through text descriptions
- Add new objects and characters to a frame
- Iterative improvement through multiple editing cycles
- Support for multimodal prompts (text, photo, audio)
Industrial Applications
For content creators, this radically simplifies the workflow. Instead of using separate tools for video, audio, and images, users can work within a unified ecosystem. This is especially important for independent creators with limited software budgets.
In the professional film industry, Gemini Omni can accelerate post-production. Editors will be able to quickly generate scene variations, and directors can experiment with different versions of a shot without reshooting.
For marketing and advertising, this means faster adaptation of content for different platforms and audiences.
What This Means
The emergence of truly multimodal systems represents a shift from narrowly specialized AI tools to universal assistants. Google is moving toward a model that sees, hears, and understands the world the way humans do, and can recreate or edit that world on the fly. This is an intermediate stage on the path to more general AI capable of working with any type of information simultaneously.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.