Gemini can now create music from text and photos

Q: What is the source?

Originally published on DeepMind Blog. Hamidun News processes and adapts the material with AI.

Q: When was it published?

2026-02-23. Reading time: 3 min.

Google has added a music generation feature powered by Lyria 3 to the Gemini app — the company's most advanced development in this area. Users can create 30-sec

Hamidun News Editorial

AI monitoring · DeepMind Blog

2026-02-23· 2 min

AI-processed from DeepMind Blog; edited by Hamidun News

Gemini can now create music from text and photos — Source: DeepMind Blog. Collage: Hamidun News.

◐ Listen to article

The boundary between text and sound has become thinner: Google has embedded a music generation tool based on the Lyria 3 model into the Gemini app. Now any user can describe desired sound with words or upload a photograph—and get a ready-made 30-second track. No musical notation, no studio knowledge, no special equipment. This is not just another new feature in a long list of updates—it is Google's attempt to redefine who actually has the right to be called a music author.

To understand the scale of this step, it is important to recall the context. Text-based audio generation has existed for several years: Suno, Udio, Meta's MusicGen—all of them offered similar capabilities of varying quality. But most of these services existed separately from mainstream products, required registration in specialized applications, and remained a niche hobby of technically prepared audiences. Google is betting on something different: Lyria 3 is embedded directly in Gemini—an application used by hundreds of millions of people worldwide. The barrier to entry almost completely disappears.

Lyria 3 is Google's most advanced musical model to date. The company developed it as part of DeepMind's research direction, and now the result of this work is moving from the laboratory into the pocket of an ordinary user. The mechanics are simple: you describe the mood, genre, instruments, or atmosphere in text—for example, "relaxing lo-fi with piano and rain outside the window"—and the model generates a track. The alternative path is even more interesting: you can upload an image, and Lyria 3 itself interprets its visual content into a musical image. A sunset over the sea becomes one melody, city hustle becomes something completely different. It is this multimodal approach that distinguishes Google's offering from most competitors.

For the industry, this is a signal with several layers of meaning. The first is obvious: the largest technology companies are seriously competing for the creative audience. OpenAI has already integrated image generation into ChatGPT, Meta is developing its own multimodal tools, Apple quietly builds AI functions into its ecosystem.

Google with Lyria 3 in Gemini makes music the next battlefield. The second layer is more complex: the appearance of such tools in a mainstream product inevitably raises questions about copyright and monetization. What data was Lyria 3 trained on?

What happens to the track you generated—can you publish it, sell it, use it in commercial projects? So far, Google has not provided exhaustive official answers, and this remains a zone of uncertainty that the industry will watch carefully.

For the ordinary user, the consequences are far more straightforward. A podcaster gets the ability to create a unique intro in a minute. A video content creator gets background music without needing to search for tracks with Creative Commons license. A person who has long had a melody in their head can finally materialize it without knowing a single note. It is this audience—not professional musicians, but millions of people with creative needs and no technical means—that is Google's real target. Thirty seconds is certainly brief, but it is a perfectly sufficient format for jingles, intros, atmospheric inserts, and experiments.

One should look at the future here without excessive euphoria, but also without skepticism. Music generation is still far from threatening professional composers—just as text AIs have not displaced journalists. But it is changing the economics of creativity: lowering the cost of content production, expanding the circle of people capable of creating content, and creating new professional roles—those who can skillfully formulate requests and edit the result. Lyria 3 in Gemini is not the end of the music profession, but the beginning of a conversation about what it means to be an author in an era when the tool itself knows how to play.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation