Google Veo 3.1 can turn portrait photos into vertical videos
Google has released an update for its Veo 3.1 video generation model. The key changes: the "Ingredients to Video" tool now reproduces reference images more accu
AI-processed from The Verge; edited by Hamidun News
Vertical video has definitively stopped being a second-class format. Google has updated its Veo 3.1 video generation model, adding native support for vertical videos — the kind that dominate TikTok, Instagram Reels, and YouTube Shorts. But it's not just about rotating the frame 90 degrees: the company has seriously reworked the mechanism responsible for how accurately the generated video corresponds to the source images.
The "Ingredients to Video" tool, first introduced last year, allows users to upload up to three reference images and create video clips based on them. These can be character portraits, background textures, environmental elements — essentially visual "ingredients" from which the neural network assembles the final clip. The problem with the previous version was that the model often "filled in" details, deviating from the uploaded references. The update is meant to fix this: Google promises "more expressive and creative" results with "rich" reproduction of the source materials.
Why vertical video became the focus of the update — a question whose answer lies on the surface. Short vertical clips generate billions of views daily. Content creators, marketers, and social media specialists have long needed tools capable of quickly producing visually appealing content in this format. Until now, most AI video generators were oriented toward the horizontal, "cinematic" 16:9 format, and vertical clips had to be cropped manually, losing quality and composition. Native support means the model initially builds the frame composition for vertical orientation — with proper object placement, accounting for facial proportions and background.
The upscaling function deserves special attention. Generative video models are still limited in resolution: the computational costs of creating 4K video are astronomically high. Upscaling allows you to generate a clip at a lower resolution and then intelligently scale it, preserving details and sharpness. This is a pragmatic compromise that makes AI video suitable for publishing on platforms that require at least Full HD.
The context of this update cannot be understood without looking at the competitive race. OpenAI continues to develop Sora, which is already available to ChatGPT Plus subscribers. Runway releases new iterations of Gen-3 Alpha. Chinese companies — Kling, MiniMax, and ByteDance with its model — are accelerating at an alarming pace. In this environment, Google cannot afford to lag behind, especially given that Veo is integrated into the Gemini ecosystem and potentially accessible to hundreds of millions of users through Google services. Each functional update is not just a technical improvement but a strategic move in the fight for the generative video market, which, according to analysts' forecasts, could exceed 10 billion dollars by 2028.
It's also important to note that improving consistency with reference images addresses one of the main pain points of generative video model users. When you upload a photo of a specific person and want to get a video with exactly that face, even small deviations — a different nose shape, changed eye color, "drifting" facial features — destroy the illusion. For commercial use, whether advertising or brand content, such errors are unacceptable. If Google has truly managed to increase reproduction accuracy, this brings Veo closer to the threshold of commercial viability.
Practical consequences for Russian users are still limited: access to Veo through Google services in Russia is hindered, and the company provides a full API for third-party developers selectively. Nevertheless, the trend is clear — AI video generation is rapidly moving from an experimental plaything to a working tool. Vertical format, precise adherence to references, resolution enhancement — all of this are bricks in the foundation of the future, where a significant portion of video content on social networks will be created not by a camera, but by a neural network.
Google methodically closes the gap between what generative models can do in theory and what the real market demands of them. Vertical video from a portrait photo — this is not a revolution. This is engineering maturity, and it is this that will determine who ultimately takes the dominant position in the AI video industry.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.