Prompt decomposition in Gemini and Kling: how to recreate a Pinterest visual in a brand style
Prompt decomposition is the main secret to precise image generation. Instead of one long prompt, you need to break the reference into components — mood…
AI-processed from Habr AI; edited by Hamidun News
If the generation result looks "almost good, but not quite" — most often it's not a tool issue, but how the request is formulated. An author from the Russian-speaking AI community shared a concrete working method: decomposing a prompt into separate semantic layers allows achieving precise results on the first or second attempt — instead of dozens of blind iterations. The tools in the example are Gemini for image generation and Kling for animation.
The starting point was an image from Pinterest. The task was non-trivial: not to copy it, but to adapt it to the company's brand style — preserving the mood and overall composition while completely replacing colors, details, and aesthetics according to the brand guidelines. This is where decomposition begins.
Instead of one long prompt, the author broke down the source image into separate components: overall scene and atmosphere, color palette, lighting, textures, rendering style, foreground and background details. Each element was described separately — sequentially, layer by layer, with gradual refinement of details. Gemini served as the generation tool.
The key principle of work is not "upload everything into one prompt and hope," but structured dialogue with sequential refinement of each element. First, the overall scene is set. Then the style is refined.
Then brand specifics are added: colors from the guideline, characteristic identity elements, permitted and non-permitted visual solutions. This approach drastically reduces the number of iterations: the model receives clear instructions rather than trying to guess intent from a vague description. Multimodal models respond better to concrete descriptors than abstract definitions.
"Warm sunset" produces unpredictable results. "Golden-orange lighting at a 45-degree angle, long soft shadows" — works predictably. "Corporate blue in the brand spirit" — unclear instruction.
"RGB 0, 82, 204, glossy surface, no gradients" — already specific. Prompt decomposition is translating a visual image into language the model understands unambiguously. After the image was assembled to match the required identity, Kling came into play — a tool for animating static images based on video generative models.
Here decomposition also works: the prompt separately specifies what should move, at what speed, in what direction, and with what intensity. An animation prompt is not a video description, but a set of instructions for scene physics. Which elements remain static, which get movement, how noticeable it should be, and whether subtle "breathing" effect or full cinematography with camera dynamics is needed.
The final result — an animated branded image created without a designer and videographer in a few hours of work with two tools. The approach scales: the same principles work for social media content, advertising banners, presentation materials, and any visual requiring brand guideline compliance. Prompt decomposition methodology is reproducible for any project, any identity, and any generative tool.
The principle doesn't change — only the specific details change. For those wanting to apply this method: start with a maximally detailed description of the source reference. Break it down into 5–7 separate characteristics.
Write the prompt not as one long sentence, but as a structured parameter list. Check each layer separately before assembling the final request. It's this exact sequence — not magic of a specific tool — that gives predictable results when working with any generative AI.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.