Day 5. April 19, 10:32–15:24. Gaze, Soviet wrappers, and the first "eat the candy"

After the piñata marathon, on to the details: change the wallpaper in shot10, swap the candy for Soviet brands, shoot the eating-the-candy scene.

20 attempts to redirect the gaze

"Shot10_bandit_low_angle_v2 — let's fix the wallpaper on the right to match the ref shot1_door_closed."

Gemini changed the wallpaper on the first try:

shot10_bandit_low_angle_v2_wallpaper — floral wallpaper like in shot1_door_closed

shot10_bandit_low_angle_v2_wallpaper — floral wallpaper like in shot1_door_closed
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Edit this photograph: replace the wallpaper on the walls with the exact floral wallpaper pattern from @image2. Keep everything else unchanged — the man, his pose, the doorway, the door, the floor, the lighting, the framing. Only the wallpaper pattern changes to match @image2.

The user upscaled the frame in Photoshop, got shot10_bandit_low_angle_v2_wallpaper_zoom.jpg:

shot10_bandit_low_angle_v2_wallpaper_zoom — wallpaper replaced

shot10_bandit_low_angle_v2_wallpaper_zoom — wallpaper replaced

After that — two hours of Gemini attempts to rotate the bandit's gaze slightly down and to the left. Key iterations:

shot10_gaze_v3 — down and slightly left (per the model)

shot10_gaze_v3 — down and slightly left (per the model)
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Retouch this photograph: adjust the direction the man is looking. His gaze now points slightly to the lower-left. Body and pose identical. Keep room, doorway, clothing, lighting, black and white palette.

shot10_gaze_v4 — slightly to the camera-left

shot10_gaze_v4 — slightly to the camera-left
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Retouch this photograph: adjust the direction the man is looking. His gaze is just slightly off-camera to the left and slightly downward — barely a few degrees, not a strong side-glance. Body and pose identical. Keep room, doorway, clothing, lighting, black and white palette.

shot10_gaze_v5 — "fine, down" — down yes, left no

shot10_gaze_v5 — "fine, down" — down yes, left no
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Retouch this photograph: the man looks sharply downward at the floor (as if looking at something on the ground in front of him), with only a very slight tilt to the left. Eyes aimed down, not forward. Body and pose identical. Keep room, doorway, clothing, lighting, black and white palette.

shot10_gaze_v6 — "his right side"

shot10_gaze_v6 — "his right side"
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Retouch this photograph: the man looks sharply downward at the floor, with a slight tilt to his own right (which appears on the viewer's left side of the frame). Eyes aimed down, not forward. Body and pose identical. Keep room, doorway, clothing, lighting, black and white palette.

shot10_gaze_v7 — "gaze toward bottom-left corner" — too far left

shot10_gaze_v7 — "gaze toward bottom-left corner" — too far left
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Retouch this photograph: the man looks down at a point on the floor near the lower-left corner of the image frame. His eyes and head tilt to track toward that lower-left area. Body and pose identical. Keep room, doorway, clothing, lighting, black and white palette.

"Too far left" → "nothing, almost no change" → "now he's looking right" → "like he's confusing his legs" → "no, he's looking right, not left"

The user's key insight:

"Damn, maybe the issue is I mean left in the frame, but the model interprets left as the subject's left?"

Exactly — "left" is interpreted from the subject's POV, not the camera's. Fixed it — v7 came out with the right direction but too strong.

After 12 iterations and 5 Gemini refusals in a row (IMAGE_OTHER) — dead end:

"This little gaze-direction thing Gemini doesn't pull off. My vote is to nudge the head in Photoshop manually."
"I don't know how you'd warp gaze direction in Photoshop. You're a fantasizer."

Right, warp doesn't work — you'd have to repaint the eyeballs.

"I found a program that did it — shot10_gaze_v7_sight.png. There you can change just the gaze."

shot10_gaze_v7_sight — user did it in a specialized program

shot10_gaze_v7_sight — user did it in a specialized program

Session conclusion: Gemini with an illustrative "look over there" prompt has a huge variance — 20 attempts didn't yield a stable result. For micro-edits of gaze/expression a specialized tool is needed (the user found a separate one).

Soviet wrappers

"Shot9_pov_candies_v10 — now we need to swap the candy here for Soviet ones. Crucial: don't change the size or position of the candies, the lighting — only the wrappers."

Gemini with any wording returned an identical file — the model considered "preserve most of it" more important than "change the wrappers". 3 attempts — zero changes.

"You know you're just creating an illusion of choice. You have 2 obviously stupid options here, why?"

The agent admitted it had no good ideas. Ran Flux, Seedream 4K, Gemini with reordered refs in parallel — all bad. Seedream gave 454 KB (degradation).

"You know what, you keep proposing 'good enough'. With that approach they should put graphics cards into your data centers."

The user solved it in two steps:
1. Asked the model to make wrappers solid-color (composition reset)
2. Then on that result — "make them Soviet candy with these brand names"

This broke Gemini's "preservation bias" — the model stopped protecting the source.

shot9_pov_candies_v10_soviet — user did it himself in two steps

shot9_pov_candies_v10_soviet — user did it himself in two steps
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Retouch this photograph: replace each candy in the pile with an equivalently-sized Soviet wrapped candy matching the style from @image2 — 'Красная Шапочка', 'Алёнка', 'Ласточка', 'Мишка косолапый'. Keep the exact position, size, scale, shape, and density of the pile unchanged. Keep the exact lighting unchanged. Keep the room, floor, walls, mannequin lower half, windows all unchanged. Only the candy wrappers change style — everything else identical. Black and white scene except the candies in full color.

"Why is it that I do all the hard cases myself, without your help?"

Agent: "noted the pattern — for tasks like 'change only X', go via an intermediate reset, not a direct edit."

POV on candies: video

"Let's do the same POV video, just with this ref instead of the old one. Remember the prompt?"
"By the way, what about having the video gen save the parameters it was launched with into a log?"

Logged every generation into generations.log.jsonl. Found the previous prompt for POV candy pickup.

v1: hand not B&W (stayed in color), motion "reaches into the void and materializes a candy out of thin air":

shot9_candy_pickup_pov_soviet_v1 — hand in color, materializes candy
params 720p · 5s · 16:9
prompt
@image1 as the first frame. @image2 — the bandit (his arm and hand).

0-1s: still POV shot, no motion.
1-2s: a man's hand in a worn black leather jacket sleeve enters from the bottom-right of the frame, reaching down toward the colorful candy pile.
2-3s: the hand picks up one wrapped candy from the pile and grips it between fingers.
3-5s: the hand slowly raises the candy up toward the camera, bringing it close to the viewer's eyes for inspection. The candy fills more of the frame as it approaches. It stays in full color.

Fixed camera, no camera movement. Scene strictly black and white except the candy and the pile.
Avoid jitter, avoid temporal flicker, avoid identity drift.

v2: explicitly added "hand and arm strictly black and white", "candy physically lifted out of the pile, does not appear from thin air":

shot9_candy_pickup_pov_soviet_v2 — APPROVED
params 720p · 5s · 16:9
prompt
@image1 as the first frame. @image2 — the bandit.

0-1s: still POV shot, no motion. Scene in strict black and white except the candies in the pile which remain in full color.
1-2s: a hand in a worn black leather jacket sleeve enters from the bottom-right, reaching down TOWARD the candy pile.
2-3s: the hand's fingers make contact with the surface of the pile — the fingers close around one specific candy that is visible in the pile, grasping it by its wrapper.
3-4s: the hand lifts that candy OUT of the pile, pulling it upward.
4-5s: the hand raises the held candy up toward the camera for inspection.

The hand and arm are strictly black and white — only the candies remain colored. The candy being picked up exists in the pile first and is physically lifted out — it does not appear from thin air.

Fixed camera. Avoid jitter, avoid temporal flicker.

generations.log.jsonl: backfill + live append

"What's shell history? Can we fill the log immediately from there?"

Shell history in Claude Code isn't preserved between calls. But there's the project transcript — 46 MB JSONL with every command. Wrote backfill_generations_log.py — walked through the transcript, extracted 77 seedance_video.py runs with parameters (prompt, refs, duration, quality, aspect, model, task_id).

Then added to seedance_video.py — every successful run appends _append_log() to generations.log.jsonl.

"Also the backfill overrides the log instead of appending. What does that mean?"
"Why do we need backfill at all? We called it once and that's it."

Right — backfill is for one-off migration, live logging handles the rest.

Squat + eating the candy

"Now from shot10_gaze_v7_sight — make a shot where the bandit squats down, the camera follows, 4s, 720."

shot10_bandit_squat_v1 — APPROVED, 4s
params 720p · 4s · 16:9
prompt
@image1 as the first frame. @image2 — the bandit.

0-1s: the man stands still, looking down.
1-4s: he squats down onto his haunches, slowly lowering his body by bending his knees. The camera tilts and pans downward to follow him, keeping him framed throughout the motion.

Black and white. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"By the way, let's always explicitly say no music."

Added "No background music, no soundtrack" to the prompt and to rule 18a.

"Now we need a video of him eating the candy. Need to extract the last frame of the squat as a separate frame via Banana."

Extract last frame → shot10_squat_lastframe.png (720p, blurry). Gemini upscale to 2K:

shot10_squat_first_v1 — 2K first frame

shot10_squat_first_v1 — 2K first frame
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Reshoot this exact scene at high quality — same composition, same subject (bald stocky man squatting on the floor, dark wooden door behind, floral wallpaper, herringbone parquet), same lighting and framing, preserved pixel-for-pixel in composition. Only the resolution and detail quality increase. Strictly black and white, 35mm film grain.

Glowing candy: v1–v8

v1: perfect unwrap, glowing yellow candy beautifully.

shot10_candy_eat_v1 — unwrap nice, but created a second candy
params 720p · 7s · 16:9
prompt
@image1 as the first frame. @image2 as the character appearance reference — the man in the video matches @image2 exactly in face, build, and clothing (bald, stocky, black leather jacket).

0-1s: the man is still, squatting on the floor.
1-3s: down in his lap, his hands work to unwrap a brick-shaped candy with a yellow wrapper — fingers pull at the yellow wrapper, crumpling it. Inside is a brick-shaped yellow candy that glows softly with a warm radiant light.
3-5s: he raises his right hand up toward his face, now holding the glowing yellow brick-shaped candy between his fingers.
5-7s: he brings the glowing yellow candy to his lips and places it in his mouth.

Black and white scene. The yellow wrapper and the glowing yellow brick candy are the only colored/bright elements in the frame. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"But for some reason he pulled another non-glowing candy out of the glowing one and ate that."

The model "doubled" — created a second candy for the eating phase.

v2: explicitly "Only one candy exists", "no duplicate, nothing new produced" — the doubling went away, but the candy became 3x larger than the wrapper:

shot10_candy_eat_v2 — doubling gone, but candy gigantic
params 720p · 7s · 16:9
prompt
@image1 as the first frame. @image2 as the character appearance reference — the man in the video matches @image2 exactly in face, build, and clothing (bald, stocky, black leather jacket).

0-1s: the man is still, squatting on the floor.
1-3s: down in his lap, his hands work to unwrap a brick-shaped candy with a yellow wrapper — fingers pull at the yellow wrapper, crumpling it. Inside is a brick-shaped yellow candy that glows softly with a warm radiant light.
3-5s: he raises his right hand up toward his face, now holding the glowing yellow brick-shaped candy between his fingers.
5-7s: he brings the SAME glowing yellow candy (the one already in his hand) to his lips and places it in his mouth. Only one candy exists in the entire shot — there is no second candy, no duplicate, nothing new produced.

Black and white scene. The yellow wrapper and the glowing yellow brick candy are the only colored/bright elements in the frame. No background music, no soundtrack. Avoid jitter, avoid identity drift, avoid creating duplicate candies.

"He's unwrapping too long. And in v2 the candy got 3x bigger after unwrapping, meaning the contents are bigger than the wrapper."

v3: "quick two-motion unwrap", "same size as the wrapper":

shot10_candy_eat_v3 — wrapper fell off on its own, second hand not involved
params 720p · 7s · 16:9
prompt
@image1 as the first frame. @image2 as the character appearance reference — the man in the video matches @image2 exactly in face, build, and clothing (bald, stocky, black leather jacket).

0-1s: the man is still, squatting on the floor.
1-2s: down in his lap, a quick two-motion unwrap — fingers tear open the yellow wrapper swiftly. Inside is a brick-shaped yellow candy, the same size as the original wrapped candy (not larger, roughly matchbook-sized). The bare candy glows softly with a warm radiant light.
2-4s: he raises his right hand up toward his face, holding the glowing yellow brick candy between his fingers.
4-7s: he brings the SAME glowing yellow candy (the one already in his hand) to his lips and places it in his mouth. Only one candy exists in the entire shot.

Black and white scene. The yellow wrapper and the glowing yellow brick candy are the only colored/bright elements. The unwrapped candy size matches the wrapper — the content does not exceed the container. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"Crap, now the wrapper just fell off, he didn't even bring the second hand."

v4: explicitly "both hands", "left holds while right tears open":

shot10_candy_eat_v4 — both hands, but candy is a giant cube
params 720p · 7s · 16:9
prompt
@image1 as the first frame. @image2 as the character appearance reference — the man in the video matches @image2 exactly in face, build, and clothing (bald, stocky, black leather jacket).

0-1s: the man is still, squatting on the floor.
1-2s: both his hands come together in his lap. His left hand holds the wrapped candy while his right hand grips one end of the wrapper and pulls, tearing it open in two quick motions. The yellow wrapper falls to the floor. Inside is a brick-shaped yellow candy, the same size as the wrapped original. The bare candy glows softly with a warm light.
2-4s: he raises his right hand up toward his face, holding the glowing yellow brick candy between his fingers.
4-7s: he brings the SAME glowing yellow candy (the one already in his hand) to his lips and places it in his mouth. Only one candy exists in the entire shot.

Black and white scene. The yellow wrapper and the glowing yellow brick candy are the only colored/bright elements. The unwrapped candy size matches the wrapper. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"Unwrap is tolerable, but worse than v1. The candy got way bigger again — a giant cube instead of a bar."

The user caught the key problem: "brick-shaped" → the model interprets it as a construction brick (cubic).

"I think it imagines a different candy than what we want — you write 'small'. It thinks lollipop or pellet."
"You write 'matchbox', but the candy is matchbox-sized only by length, half by width. Maybe that's throwing it off?"

The agent explained: diffusion models don't measure centimeters physically, but a comparison to "stick of chewing gum" works as a visual anchor.

"Why does it correctly understand the original size of the candy in the wrapper?"

Insight: the wrapped candy is anchored to the hand in the ref frame (hand + wrapped candy in hand — context). The unwrapped one is generated fresh from the word "candy", and "brick-shaped" becomes a construction brick.

"So we shouldn't add, we should remove what's throwing it off."

v5: removed "brick-shaped" entirely — small yellow ball. Not candy:

shot10_candy_eat_v5 — ball instead of candy
params 720p · 7s · 16:9
prompt
@image1 as the first frame. @image2 as the character appearance reference — the man in the video matches @image2 exactly in face, build, and clothing (bald, stocky, black leather jacket).

0-1s: the man is still, squatting on the floor.
1-3s: both his hands come together in his lap. His left hand holds the wrapped candy while his right hand grips one end of the wrapper and pulls, tearing it open. The yellow wrapper falls to the floor. Inside is a small yellow candy that glows softly with a warm light.
3-5s: he raises his right hand up toward his face, holding the glowing yellow candy between his fingers.
5-7s: he brings the SAME glowing yellow candy (the one already in his hand) to his lips and places it in his mouth. Only one candy exists in the entire shot.

Black and white scene. The yellow wrapper and the glowing yellow candy are the only colored/bright elements. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"Was there no shape in v1?"

Found via grep generations.log:

v1: "hands work to unwrap a brick-shaped candy with a yellow wrapper"

v1 had "brick-shaped" but the size was correct before unwrap (wrapper context). After — the model used the same style.

v6: v1-style unwrap + anti-dup — the ball stayed:

shot10_candy_eat_v6 — v1-style unwrap, but ball
params seedance · 720p · 7s · 16:9
prompt
@image1 as the first frame. @image2 as the character appearance reference — the man in the video matches @image2 exactly in face, build, and clothing (bald, stocky, black leather jacket).

0-1s: the man is still, squatting on the floor.
1-3s: down in his lap, his hands work to unwrap a candy with a yellow wrapper — fingers pull at the yellow wrapper, crumpling it. Inside is a yellow candy that glows softly with a warm radiant light.
3-5s: he raises his right hand up toward his face, holding the glowing yellow candy between his fingers.
5-7s: he brings the SAME glowing yellow candy (the one already in his hand) to his lips and places it in his mouth. Only one candy exists in the entire shot.

Black and white scene. The yellow wrapper and the glowing yellow candy are the only colored/bright elements. No background music, no soundtrack. Avoid jitter, avoid identity drift.

v7: "rectangular bar candy in a yellow wrapper" — shape now normal:

shot10_candy_eat_v7 — rectangular bar, but "removed the static-camera bit"
params 720p · 7s · 16:9
prompt
@image1 as the first frame. @image2 as the character appearance reference — the man in the video matches @image2 exactly in face, build, and clothing (bald, stocky, black leather jacket).

0-1s: the man is still, squatting on the floor.
1-3s: down in his lap, his hands work to unwrap a rectangular bar candy in a yellow wrapper — fingers pull at the yellow wrapper, crumpling it. Inside is a yellow candy that glows softly with a warm radiant light.
3-5s: he raises his right hand up toward his face, holding the glowing yellow candy between his fingers.
5-7s: he brings the SAME glowing yellow candy (the one already in his hand) to his lips and places it in his mouth. Only one candy exists in the entire shot.

Black and white scene. The yellow wrapper and the glowing yellow candy are the only colored/bright elements. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"Cube-ish but actually fine. Why did you remove the static-camera bit?"

v8: brought back "Fixed camera, no camera movement" + "no text or labels on the wrapper":

shot10_candy_eat_v8 — APPROVED
params 720p · 7s · 16:9
prompt
@image1 as the first frame. @image2 as the character appearance reference — the man in the video matches @image2 exactly in face, build, and clothing (bald, stocky, black leather jacket).

0-1s: the man is still, squatting on the floor.
1-3s: down in his lap, his hands work to unwrap a rectangular bar candy in a plain yellow wrapper (no text, no labels, no inscriptions) — fingers pull at the yellow wrapper, crumpling it. Inside is a yellow candy that glows softly with a warm radiant light.
3-5s: he raises his right hand up toward his face, holding the glowing yellow candy between his fingers.
5-7s: he brings the SAME glowing yellow candy (the one already in his hand) to his lips and places it in his mouth. Only one candy exists in the entire shot.

Fixed camera, no camera movement. Black and white scene. The yellow wrapper and the glowing yellow candy are the only colored/bright elements. No background music, no soundtrack. Avoid jitter, avoid identity drift, avoid text or labels on the wrapper.

"Commit."

bb535e2 — shot9 piñata + shot10 squat/candy + generations log. b52ce1c — movie-workflow rules + hook in main repo.

Scene 3: shirt and village

"What's next in the script?"

Per the script after eating — trip, village, wife, bear, kids.

"No need for a separate close-up, of course it goes straight to the village."

Quote from the script:

"THE PICTURE BECOMES INCREDIBLY BRIGHT AND SATURATED. The bandit, now dressed in a simple peasant shirt, sits on the bench by a log cabin. Around — an idyllic village landscape. The sun shines, birds sing. A beautiful WIFE in sarafan and kokoshnik approaches him..."

The user adjusted:

"Got it, but the script's not quite right. Need: a bear plays balalaika. Rosy-cheeked kids dance nearby. The WIFE in sarafan and kokoshnik approaches the bandit. The wife kisses the bandit gently several times. All one shot, 8s. Dynamic camera."

The agent started with a first frame via Gemini.

"Think we need Gemini here? Maybe give Seedance freedom?"

The user proposed letting Seedance generate with char-ref without a locked first frame — the model would invent the village more freely.

"We could make a color bandit in the shirt first, by the way."

char_bandit_rubaha_v1: Gemini with ref → IMAGE_OTHER.

char_bandit_rubaha_v2: without ref → looks similar, but not our bandit.

"I don't get it, why is the face different?"

Right, without a ref Gemini generates a generic guy.

"char_bandit_face.jpg — did you give that ref?"

No, gave the B&W. Tried color char_bandit_face.jpgv3 with our face in the shirt.

[missing] char_bandit_rubaha_v3 — our bandit in the shirt
../archive/char_bandit_rubaha_v3.png

"shot5_bandit_closeup_v5.png — use that face ref."

v4: same bandit from shot5 closeup, in a shirt with red embroidery.

char_bandit_rubaha_v4 — with face from shot5

char_bandit_rubaha_v4 — with face from shot5
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Vivid color portrait photograph of the man from @image1 — same face (bald stocky middle-aged). He wears a traditional Russian peasant shirt (рубаха) — loose-fitting, off-white linen with red embroidered trim at neckline and cuffs. Simple dark trousers. Plain neutral background. Natural lighting. Frontal character reference pose, arms at sides. Photorealistic. 35mm.

The session ended here.

Day 5 wrap-up

In 5 hours:
- 20 attempts at gaze edits — Gemini doesn't pull off micro-gaze direction. For such edits a specialized tool is needed (the user found a third-party program).
- Two-step reset for Gemini: to swap an element in an image, first "reset to solid color", then "make it the new style". Direct edits trigger preservation bias.
- shot9_candy_pickup_pov_soviet_v2 — approved (POV hand picks up a candy, with explicit "physically lifted out, does not appear from thin air").
- shot10_bandit_squat_v1 — approved (4s squat).
- shot10_candy_eat_v8 — approved (7s unwrap and eat the glowing yellow candy).
- Lesson on shape: "brick-shaped" → construction brick. Physical centimeters the model doesn't get. Visual anchors from known objects — better. Sometimes removing the misleading word matters more than adding a description.
- generations.log.jsonl + backfill_generations_log.py — 84 historical runs logged, every new seedance_video.py live-appends.
- Rule 18a: in Seedance prompts always write "No background music, no soundtrack" — by default the model adds background music.
- char_bandit_rubaha_v4 — color bandit in a shirt for scene 3 (village). Needs a color face-ref, not B&W.