"dream_old.mp4 — I dropped the file in for you, take the first frame from it. Then make a shirt on the bandit like the one in there."
Extracted the first frame of the old dream — red kosovorotka with embroidery on the collar, shoulders, cuffs. Through Gemini 3 Pro Image, repainted our bandit into the same shirt:

"Now we need to upscale the first frame of the old dream and make it our bandit in shirt v5."
Gemini regenerated the frame at 2K — Russian log cabin with blue window frames, wife in sarafan and kokoshnik, bear with a balalaika, child, samovar, bandit drinking tea:

"Now animate. Camera glides smoothly. Bandit sips tea. Wife stands up, walks to him, kisses him. Bear plays balalaika, kid dances. 8s, no need to break it up."
v1: with shot11_dream_first_v1 as first frame + char_rubaha_v5:
"Tolerable, not wow. Let's try giving the model only the text and the bandit ref."
v2: char-ref only, no scene-ref — village described in text. Triggered auto-fit on 1111×1536 (59% deviation), had to use --skip-aspect-fit:
"The bear plays slowly — balalaika is played fast with the hand (well, paw)."
v3: "paw strums the strings in rapid rhythmic motion, just like a real balalaika player":
"Want another version. Bandit smiles at the wife. Samovar without the chimney."
v4: "smiling warmly toward the approaching woman", "samovar without the chimney pipe on top":
"Not it, composition's not great, let's reroll random."
v5: same prompt, new seed:
"Next time add that the wife is slim and pretty. Or we could do that in parallel?"
v6: added "slim beautiful young woman":
"OK, two more variations at once."
v7 and v8 — in parallel:
"Why did you remove the chimney bit? And the bandit on the bench thing too?"
Everything is in the prompt. Seedance on random seeds ignores some details.
"Two more variants."
v9 and v10:
"Two more variations."
v11 and v12 (v11 hit a 402, topped up, restarted):
"OK, v12 is our winner, pull the first frame from it."
Extracted shot11_dream_v12_firstframe.png.
"Need a closer angle just on the bear."
Idea from the guide — video-ref instead of frame→animate: the model preserves both motion precisely + background. Trimmed the first 3 seconds of v12, ran with a close-up:
"The bear got a collar that wasn't in the ref."
v2 with negatives "avoid adding a collar or leash to the bear":
"Now a shot of the samovar. Easier without a video-ref."
shot11_samovar_closeup_v1 — wrong angle:
"Let's build the frame through Banana first. Need a detail shot of the spout. Important: there's a tablecloth with dishes."
Iterations on angle (spout to the left, camera from the bandit's side):



v4: tablecloth under the samovar + no pattern:

v5, v6: tablecloth under the samovar (Gemini stuck, won't edit):
"0 changes, no clue what you wrote in the prompt. I did it myself, wrote 'make the tablecloth cover the entire bench' and got shot11_samovar_detail_v7_user. And it works."

Then the agent went off and tried to save into feedback memory the pattern "simple imperative in Russian works better". User:
"What are you doing? Why the hell do you need agent memory? Forgot what folder we're in?"
Animated the user version:
"Once more on shot11_samovar_detail_v8_user.png, I fixed the geometry."

After several prompt iterations (steam from the top → from the spout → just a little):
"And let's animate this one too — that's the top of the samovar (not the chimney). 4s, with steam."

"I took the last frame of v12, polished in Gemini → shot11_dream_kisses.png. 6s video, the wife kisses the bandit in rhythm: kiss, kiss-kiss (2x), kiss, kiss-kiss. Static frame."

v1 — wrote out timecodes, 0.8s per kiss:
"Model didn't follow the sequence or the count of kisses."
"Don't say 'in the following sequence', spell out the sequence explicitly. You could've thought of that."
v2 — explicit "kisses once, pulls back briefly, kisses twice quickly, pulls back, kisses once, pulls back, kisses twice":
"Commit."
0659e79 — 40 files.
"We need a shot of the bandit on the floor with the partner's shoulder. Room ref: shot1_room_no_bandit_v3. But discuss with me how we'll do it first."
The user proposed:
- Shoulder on the left, bandit unconscious lying with his head by the doorway
- No candy visible, no corpse either
- Have a partner ref (need B&W)
Converted char_bandit_2_front.jpg → char_bandit_2_front_bw.jpg.
User's pose_ref + room + faces of both — but the angle is wrong, Seedream does eye level:
v2 with "EXTREME HIGH-ANGLE BIRD'S-EYE VIEW":
"No, bad. I think we should take the pose_ref and replace the bandit's face there with ours."

facefusion swapped the face — but the bandit kept hair and beard (face swapper changes only the face region, not the whole head):
"Can't you just ask Banana to do this?"
Gemini — 2 refusals IMAGE_OTHER on "replace head with bald". A simple "shave bald, remove beard" worked:
facefusion again → face + bald:

"Leather jacket isn't right, we already established our version."
v6 — leather jacket repainted:
"Let's remove the zipper and replace the parquet with the one from the room."
v7: Gemini flipped the angle, v8 with explicit "keep top-down camera":

"Let's try another method. Just place shot5_bandit_closeup_v5 on the parquet (we'll add the shoulder later)."
"But he's not lying toward the door, and the door opens outward, not inward."
"No, crap. Just make a lying-down photo of the bandit."
Gemini twice IMAGE_OTHER on "lying on his back" (probably triggered by "on his back"). Switched to Seedream:

v2 with "mugshot-from-above angle":

"Not bad — shot12_bandit_lying_v2_tilted.jpg, I rotated the image, now we need parquet under it."

"Parquet's OK, but the white area needs filling too."

"Light is coming from the left, but it should be from the bottom-left."

"Light is wrong — need it from the bottom of the frame, this is also from the left. No patterns on it either."

"Let's try animating directly. A guy in a tracksuit on his haunches leans into the frame from the bottom-left and slaps our bandit, who's unconscious. 5s."
"No, we're not ready. shot12_bandit_lying_v7 — we need the back of another guy added here."





"I made it the way I wanted in Photoshop. Let's animate. Lots of slaps — they're not slaps, they're light pats, like trying to bring him to."

v2 (after 402 → top-up):
In 6 hours:
- shot11_dream_v12 — 12 iterations of the village dream (bear with balalaika, wife, child, samovar, bandit drinking tea), text-based prompt + char-ref without scene-ref.
- shot11_bear_closeup_v2 — bear close-up via the video-ref approach (first 3s of v12 as @Video1, same motion + same background).
- shot11_samovar_detail_v2 + shot11_samovar_top_v1 — two separate samovar detail shots, only the steam moves.
- shot11_dream_kisses_v2 — 6s with rhythmic kisses, an explicit list of actions works better than "in the following sequence".
- shot12_bandit_lying_v7 — bandit on the floor head→doorway, light from below, via the chain Seedream → user rotate → Gemini (parquet, light).
- shot12_slap_v2 — slap (light pats) from the partner on his haunches, top-down, user assembled the setup in PS.
- Lessons:
- Action rhythm in the prompt: not "in this sequence", but an explicit list of verbs with pauses.
- Geometry flips through Gemini — the user more often flips image tilt by hand, Gemini doesn't flip but regenerates the angle.
- Video-ref for close-ups is more convenient than text-to-image + animate: preserves both motion and background.
- --skip-aspect-fit for portrait char-refs, when Seedance crops the center itself anyway.
- A simple imperative in Russian from the user broke Gemini's preservation bias where "Keep everything else exactly the same" failed.