Day 8. April 21, 12:04 — April 22, 04:53. Lip-sync, candy collection, profile with the wrong hand

A long night — 17 hours with breaks, 61 generations. Reshot the partner's dialogue, added handshake+stand-up, candy collection, POV bag detail, partner closeup, bandit profile by the window, finale with the bag zipper. Separate pain — Gemini pulled the wrong hand 10 attempts in a row.

Reshooting the partner's line: naHAnyali

"okay come on google, Seedance just went and ruined half the speech"

In scene 10 shot14_partner_dialogue_v1 (front + tebya_suka.mp3, 7s) lip-sync was bad. Seedance guide: audio sweet spot 3-5s, emotional speech gives 100-200ms lip lag, fast speech with eliding — smeared closures.

"let's take a side angle (not front), and I made a new voice, just the first part. voice_tebya_suka.mp3, 6s, starts speaking after the first second, not right away"

shot14_partner_dialogue_v2 — side angle, new audio, pause at start
params 720p · 6s · 16:9
refs
shot12_bg_user_fitted.jpgshot12_bg_user_fitted.jpgvoice_tebya_suka.mp3
prompt
@Image1 as first frame. @Audio1 is the line the man speaks.
[00:00-00:01] He is silent, mouth closed.
[00:01-00:06] He speaks the line from @Audio1, mouth moves in sync with the audio.
Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"garbage, the speech is jumbled. shouldn't we feed the model the line text too? what does the guide say?"

Found it in the guide: text in double quotes + language. Format Character speaks in Russian: "...".

shot14_partner_dialogue_v3 — with line text in quotes
params 720p · 6s · 16:9
refs
shot12_bg_user_fitted.jpgshot12_bg_user_fitted.jpgvoice_tebya_suka.mp3
prompt
@Image1 as first frame. @Audio1 is the line the man speaks. He speaks in Russian, mouth moves in sync with @Audio1. The line he says: "тебя, сука, наняли товар принести, а не пробовать его, дебил".
Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"he said tebya naHAnyali tovar prinesti, why ((( and I can't even substitute the original audio because the rhythm is totally different"

Opened the seedance_video.py code: generate_audio: True is hardcoded. With it, Seedance treats your uploaded audio only as voice style, and synthesizes the speech itself from the text — that's where "naHAnyali" and the different rhythm come from.

"do it"

Switched to generate_audio=False when audio_urls present:

shot14_partner_dialogue_v4 — generate_audio=False, file with no sound
params seedance · 720p · 6s · 16:9
refs
shot12_bg_user.jpgshot12_bg_user.jpgvoice_tebya_suka.mp3
prompt
@Image1 as first frame. @Audio1 is the line the man speaks. He speaks in Russian, mouth moves in sync with @Audio1. The line he says: "тебя, сука, наняли товар принести, а не пробовать его, дебил".
Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"the file has no sound now"

generate_audio=False = no audio track in output. Plus lip-sync still off.

"no, it doesn't match, the lip movement isn't natural. roughly like before, only now I don't hear gibberish because there's no audio"

"no, first remove the changes that disable audio gen, that's getting in the way of me figuring out which phonemes it didn't catch"

Rolled back to generate_audio=True:

shot14_partner_dialogue_v5 — with audio, weird accent
params seedance · 720p · 6s · 16:9
refs
shot12_bg_user.jpgshot12_bg_user.jpgvoice_tebya_suka.mp3
prompt
@Image1 as first frame. @Audio1 is the line the man speaks. He speaks in Russian, mouth moves in sync with @Audio1. The line he says: "тебя, сука, наняли товар принести, а не пробовать его, дебил".
Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"okay better, it gives a strange accent. voice_tebya_suka_v2.mp3 — I made a version of the audio with cleaner separation, to make it easier for it"

New audio + corrected punctuation ("Тебя, сука. Наняли товар принести. А не пробовать его. Дебил."):

After a credit 402 and top-up:

shot14_partner_dialogue_v6 — v2 audio, new punctuation, but 6s too tight
params 720p · 6s · 16:9
prompt
@Image1 as first frame. @Audio1 is the line the man speaks. He speaks in Russian, mouth moves in sync with @Audio1. The line he says: "Тебя, сука. Наняли товар принести. А не пробовать его. Дебил."
Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"damn, it doesn't have time to say it, need 7s"

shot14_partner_dialogue_v7 — 7s, but big pauses and the final line right at the end
params 720p · 7s · 16:9
prompt
@Image1 as first frame. @Audio1 is the line the man speaks. He speaks in Russian, mouth moves in sync with @Audio1. The line he says: "Тебя, сука. Наняли товар принести. А не пробовать его. Дебил.". His eyes stay cast downward throughout, as in @Image1.
Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"shit, it puts really strong pauses between lines and again the last line is right at the timing edge, that's bad for editing. need to ask it to keep tempo and no gestures"

v8: steady even pace without long pauses + Finishes speaking by 00:05 + No hand gestures:

shot14_partner_dialogue_v8 — APPROVED, steady tempo, no gestures
params seedance · 720p · 7s · 16:9
prompt
@Image1 as first frame. @Audio1 is the line the man speaks. He speaks in Russian at a steady even pace without long pauses between phrases, mouth moves in sync with @Audio1. The line he says: "Тебя, сука. Наняли товар принести. А не пробовать его. Дебил.". Finishes speaking by 00:05, then holds still for the remaining time. His eyes stay cast downward throughout, as in @Image1. No hand gestures. Arms stay relaxed and still.
Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"victory, you did good, and I a bit too"

The key element — the user's v2 audio with clear separations. Everything else was prompt-tuning around it.

Line "Davay, sobiraysya" + handshake or stand-up

"the last line needs to be said on this frame, shot12_bg_user_front_fitted"

Long discussion of line variants (Davay. Sobiraysya. / + Vsyo, poshli) and action (offer hand / stand up and walk). An idea came in: the hand is offered vertically, like in arm-wrestling / wrist-grip — not palm-to-palm.

shot15_davay_v1 — handshake, but 1s pause before speech + horizontal handshake
params 720p · 5s · 16:9
prompt
@Image1 as first frame. @Image2 as character reference for the other person whose hand enters the frame from below. @Audio1 is the line the man speaks.
He speaks in Russian at a steady even pace, mouth moves in sync with @Audio1. The line he says: "Давай. Собирайся, блять." — finishes speaking around 00:02. Then he reaches his right hand down toward the floor off the bottom of the frame. A hand (the other man's from @Image2) rises up to meet his — they clasp each other's forearms in a wrist-grip handshake: thumbs pointing up, each hand gripping the other's wrist, the classic 'lifting' grip used to pull someone up off the ground.
His eyes stay cast downward throughout, as in @Image1. Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

v2: starts speaking immediately at 00:00, bandit's hand from the right edge:

shot15_davay_v2 — speech right away, but handshake again 180°
params 720p · 6s · 16:9
prompt
@Image1 as first frame. @Image2 as character reference for the other person whose hand enters the frame from the right side. @Audio1 is the line the man speaks.
He starts speaking in Russian immediately at 00:00, mouth moves in sync with @Audio1. The line he says: "Давай. Собирайся, блять.". Then he reaches his right hand down. A hand (the other man's from @Image2) rises into the frame from the right side — they clasp each other's forearms in a wrist-grip handshake: thumbs pointing up, each hand gripping the other's wrist, the classic 'lifting' grip used to pull someone up off the ground. He starts pulling the other man up, his arm and shoulder straining with the effort.
His eyes stay cast downward throughout, as in @Image1. Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"no, in both versions he gives his hand like a shake, like 180 degrees, but it has to be 90, that's why I asked how to describe the grip correctly"

"and did you explicitly say the second guy is lying?"

Good catch — didn't write that. Added context of the lying body + WRIST-grab not palm-to-palm, forearms vertical:

shot15_davay_v3 — APPROVED, wrist-grab with lying-body context
params 720p · 6s · 16:9
prompt
@Image1 as first frame. @Image2 as character reference for the other person whose hand enters the frame. @Audio1 is the line the man speaks.
He starts speaking in Russian immediately at 00:00, mouth moves in sync with @Audio1. The line he says: "Давай. Собирайся, блять.". Then he reaches his right hand down. The other man is lying on the floor just below the bottom of the frame and reaches his arm up vertically from floor level, his hand entering the frame from the bottom-right. They do NOT shake hands palm-to-palm. Instead: each hand closes around the other's WRIST (not the palm), forearms vertical and parallel, thumbs pointing upward. This is the classic "wrist grab" grip used to pull someone up off the ground — NOT a standard handshake. He starts pulling the other man up, his arm and shoulder straining with the effort.
His eyes stay cast downward throughout, as in @Image1. Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

In parallel a second variant — partner just gets up and leaves:

shot15_davay_standup_v1 — but said "davay" twice
params seedance · 720p · 6s · 16:9
prompt
@Image1 as first frame. @Audio1 is the line the man speaks.
He starts speaking in Russian immediately at 00:00, mouth moves in sync with @Audio1. The line he says: "Давай. Собирайся, блять.". After finishing the line, he stands up from his crouch and walks out of the frame.
His eyes stay cast downward during the speech, as in @Image1. Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

"speech problem, he said 'davay' twice"

Cause more likely — a fat pause after the first "Davay." (period = pause command, in the pause the model fills with a word repeat). Removed the timing crutches:

shot15_davay_standup_v2 — APPROVED, no timings
params seedance · 720p · 6s · 16:9
prompt
@Image1 as first frame. @Audio1 is the line the man speaks.
He speaks the line from @Audio1 in Russian: "Давай. Собирайся, блять." — mouth in sync with audio. After finishing the line, he stands up from his crouch and walks out of the frame.
His eyes stay cast downward during the speech, as in @Image1. Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

Bandit comes around (with eyes already open)

"@/$E/shot12_slap_setup_eyes_open.jpg, here's another frame of the lying bald guy, only I opened his eyes. make him just look at the partner and come around"

4s instead of 5s. After 402 and top-up:

shot13_bandit_recovers_v1 — APPROVED, 4s, looking at the partner
params 720p · 4s · 16:9
prompt
@Image1 as first frame. The bald man lying on the floor slowly regains full consciousness. His gaze stays fixed on the other man in the lower-left of the frame. Slight natural breathing, subtle head wobble, eyes blink groggily. The other man stays completely still. The air is completely clean and still — no dust, no airborne particles. Camera fixed, locked-off, no movement. Preserve the black-and-white look and composition of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift, avoid temporal flicker, avoid dust particles.

Commit 82559c4 — 72 approved + archive files.

Candy collection: axis, pose, facefusion, Seedream

"the candy collecting frame, and I'm thinking which one to use"

Discussed the script for the last 6 shots. Decided: bandit collects alone, partner watches from the couch. The scene is static, no more action.

The user gave shot16_room_crop_partner.jpg — partner on the couch, body on top, pile of candy. Asked to add the bandit via Gemini:

"what kind of bald bearded guy is that?"

My screwup — confused who's who. Fixed: bandit is bald in a leather jacket, no beard.

Three Gemini attempts with char_bandit_front_bw.jpg — all IMAGE_OTHER (safety):
- attempt 1 — apparently the body on top triggers it
- attempt 2 — removed the body, but a dark fragment of flesh stayed in the candy pile
- attempt 3 — removed the fragment too — still IMAGE_OTHER

"no, try a different ref: shot5_bandit_closeup_v5.png"

Also IMAGE_OTHER. Fourth attempt — with char_bandit_back_bw.jpg (back view):

— passed, but Gemini returned the second ref instead of the edited scene. Hard-wrote "output MUST be first image edited — NOT the second":

— Gemini squeezed it into vertical aspect and recomposed everything (window/cabinet/door gone).

"you're nuts, literally yesterday everything worked fine. and in my chat it still works fine. find the error in your prompt or code"

Switched source → shot16_room_crop_wrong_bandit.jpg (user assembled in Photoshop himself):

shot16_room_crop_wrong_bandit — user's composite with the wrong face

shot16_room_crop_wrong_bandit — user's composite with the wrong face

Task — face swap. facefusion headless-run with shot5_bandit_closeup_v5.png:

"stuck the face on badly, missed"

Two passes of facefusion (different sources + reference-mode) — keeps missing. Switched to Gemini for face-swap — IMAGE_OTHER again, then squeezed into portrait. Tried:
- Removed --size 2K and --aspect — came out portrait
- Put images BEFORE text — portrait
- Tried gemini-2.5-flash-imageIMAGE_OTHER
- gemini-3.1-flash-image-preview — timeout

"what does 'doesn't go into edit mode' mean? what does that mean? are you a moron? can you talk normally. I don't know what you've been doing"

Apologized like a human. Root cause — ref aspects: 16:9 scene + face_bw portrait (0.71) → Gemini reconciled to portrait. Seven attempts wasted.

"got it, let's swap with Seedream. here's the face ref shot5_bandit_closeup_v5.png"

Seedream with 2 refs worked first try — 16:9 preserved, face swapped, partner untouched, composition intact:

shot16_room_crop_full_seedream_v1 — APPROVED, Seedream face swap

shot16_room_crop_full_seedream_v1 — APPROVED, Seedream face swap

Candy collection — animation

With 3 refs (scene + bandit face + partner face):

shot16_collecting_v1 — 6s, 2 collecting cycles
params 720p · 6s · 16:9
prompt
@Image1 as first frame. @Image2 as character reference for the bald man — keep his face and appearance consistent with @Image2. @Image3 as character reference for the other man on the couch — keep his face and appearance consistent with @Image3.
[00:00-00:03] The bald man in the leather jacket scoops up a handful of candies from the pile with his hand and tosses them into the open sports bag beside him.
[00:03-00:06] He scoops up another handful and tosses them into the bag.
The other man on the couch watches the collecting — his head and eyes follow the hand movements, relaxed natural breathing.
Camera fixed, locked-off, no movement. No background music, no soundtrack. Avoid jitter, avoid bent limbs, avoid identity drift, avoid dust particles.

First run pending on Evolink for 40 minutes (high load). Cancelled via DELETE endpoint, restarted — came in.

"needs more length, let's make it 12s and have him collect everything in 4 passes"

shot16_collecting_v2 — APPROVED, 12s, 4 cycles, pile empty at the end
params 720p · 12s · 16:9
prompt
@Image1 as first frame. @Image2 as character reference for the bald man — keep his face and appearance consistent with @Image2. @Image3 as character reference for the other man on the couch — keep his face and appearance consistent with @Image3.
[00:00-00:03] The bald man scoops up a handful of candies from the pile with his hand and tosses them into the open sports bag beside him.
[00:03-00:06] He scoops up a second handful and tosses it into the bag.
[00:06-00:09] He scoops up a third handful and tosses it into the bag.
[00:09-00:12] He scoops up the last remaining candies and tosses them into the bag. By the end of the shot, the candy pile on the floor is completely empty.
The other man on the couch watches the collecting throughout — his head and eyes follow the hand movements, relaxed natural breathing.
Camera fixed, locked-off, no movement. No background music, no soundtrack. Avoid jitter, avoid bent limbs, avoid identity drift, avoid dust particles.

Bag detail: top-down, Soviet wrappers, in stages

Gemini on the wide shot with top-down POV:

shot17_bag_detail_v1 — composition fine, but 1:1 instead of 16:9

shot17_bag_detail_v1 — composition fine, but 1:1 instead of 16:9
params banana · gemini-3-pro-image-preview · google
prompt
Black and white film still, close-up top-down POV view from directly above into an open black sports bag resting on a parquet floor. The interior of the bag is filled with brightly wrapped colorful candies (in color — color pop against the B&W scene). The zipper of the bag is fully open along the top. Above the bag, a man's hand (from the POV of someone squatting above) holds one last colorful candy between thumb and finger, about to drop it into the bag. Grainy black and white cinematic look. Match the bag and candy style from @Image1.

v2 with --aspect 16:9:

shot17_bag_detail_v2 — 16:9 worked, but candies have no brands

shot17_bag_detail_v2 — 16:9 worked, but candies have no brands
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Black and white film still, close-up top-down POV view from directly above into an open black sports bag resting on a parquet floor. The interior of the bag is filled with brightly wrapped colorful candies (in color — color pop against the B&W scene). The zipper of the bag is fully open along the top. Above the bag, a man's hand (from the POV of someone squatting above) holds one last colorful candy between thumb and finger, about to drop it into the bag. Grainy black and white cinematic look. Match the bag and candy style from @Image1.

"needs Soviet ones — describe which, no hand needed (did you add a hand?)"

Yes, the hand was in the prompt. Removed. Described by brands: Mishka kosolapy, Mishka na severe, Krasnaya Shapochka, Belochka, Kara-Kum, Moryachok, Lastochka.

shot17_bag_detail_v3 — Soviet wrappers readable, but flap on top

shot17_bag_detail_v3 — Soviet wrappers readable, but flap on top
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Black and white film still, close-up top-down view from directly above of an open black sports bag resting on a parquet floor. The bag has a visible top flap with a long zipper running along its length — the flap is folded back so the zipper is fully open and the contents are visible. Inside the bag is a pile of classic Soviet-era wrapped candies: paper wrappers with twisted ends (fantik style), each printed with distinctive illustrated labels. Various well-known brands: "Мишка косолапый" (bear in a pine forest), "Мишка на севере" (polar bear), "Красная Шапочка" (Little Red Riding Hood with a basket), "Белочка" (squirrel with a nut), "Кара-Кум" (camel in a desert), "Морячок" (little sailor boy). Paper wrappers, not shiny plastic foil, with rich hand-drawn colorful illustrations in traditional mid-20th-century Soviet style. The candies are in full color, contrasting with the black-and-white scene around them. No hands, no people, no candy in motion — just the bag with candies inside, ready to be closed. Grainy black and white cinematic look. Match the bag style from @Image1.

"only the bag flap is on top, but it should be on the bottom. and the candies are giants, like 2x larger than they should be"

v4 — two tasks at once:

shot17_bag_detail_v4 — flap didn't move, candies even bigger, but brands readable

shot17_bag_detail_v4 — flap didn't move, candies even bigger, but brands readable
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Black and white film still, close-up top-down view from directly above of an open black sports bag resting on a parquet floor. The flap/closing side of the bag with the zipper is at the BOTTOM edge of the frame, folded down. The opening reveals the candies upward through the top. Inside the bag is a pile of classic Soviet-era wrapped candies: small fantik candies, each about the size of a finger (not palm-sized) — many small candies piled densely inside. Paper wrappers with twisted ends, each printed with distinctive illustrated labels. Various well-known brands: "Мишка косолапый" (bear in a pine forest), "Мишка на севере" (polar bear), "Красная Шапочка" (Little Red Riding Hood with a basket), "Белочка" (squirrel with a nut), "Кара-Кум" (camel in a desert), "Морячок" (little sailor boy). Paper wrappers, not shiny plastic foil, with rich hand-drawn colorful illustrations in traditional mid-20th-century Soviet style. The candies are in full color, contrasting with the black-and-white scene around them. No hands, no people, no candy in motion — just the bag with candies inside, ready to be closed. Grainy black and white cinematic look. Match the bag style from @Image1.

"we're doing a really complex thing here, I usually do this kind of thing in stages"

Lesson for Gemini: one edit per generation. Split the task.

v5: shrink candies on v3:

shot17_bag_detail_v5 — candies didn't shrink

shot17_bag_detail_v5 — candies didn't shrink
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Edit the first image: the candies inside the bag are now small bite-size candies, each about the size of a matchbox. The bag is densely filled with hundreds of these small candies, packed together. Keep the bag, parquet floor, zipper, flap, lighting, and B&W look unchanged.

"smaller still"

v6 — matchbox size, thumbnail-sized:

shot17_bag_detail_v6 — size worked, but color and brands gone

shot17_bag_detail_v6 — size worked, but color and brands gone
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Edit the first image: replace the candies with tiny wrapped candies, each one about 2 centimeters long (thumbnail-sized, much smaller than a matchbox). The bag is filled with many hundreds of these tiny candies packed densely. Keep the bag, parquet floor, zipper, flap, lighting, and B&W look unchanged.

"I'd start with an empty bag from v3 — like remove the candies"

Decoupling: empty bag first, then fill:

shot17_bag_empty_v1 — empty bag, composition preserved

shot17_bag_empty_v1 — empty bag, composition preserved
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Edit the first image: remove all candies from the bag. The bag is now empty — show the empty black interior of the bag. Keep the bag shape, parquet floor, zipper, flap, lighting, and B&W look exactly unchanged.

shot17_bag_detail_filled_v1 — filled with small Soviet candies

shot17_bag_detail_filled_v1 — filled with small Soviet candies
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Edit the first image: fill the empty bag with many small bite-size Soviet-era wrapped candies in full color — paper wrappers with twisted ends (fantik style), each about 2 centimeters long, printed with Soviet brand illustrations: "Мишка косолапый" (bear in pine forest), "Мишка на севере" (polar bear), "Красная Шапочка" (Little Red Riding Hood), "Белочка" (squirrel), "Кара-Кум" (camel in desert), "Морячок" (sailor boy). Hundreds of these tiny candies densely packed inside the bag, the bag is filled to the top. The candies are in full saturated color, contrasting with the black-and-white scene around them. Keep the bag, parquet floor, zipper, flap, lighting, and B&W look outside the bag unchanged.

"see, and you were scared"

While shot16 was cooking in the background — did a second iteration of the detail based on the wide (so the bandit's legs at the edges are visible):

shot17_bag_empty_v2 — empty bag with bandit's legs at the edges

shot17_bag_empty_v2 — empty bag with bandit's legs at the edges
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Edit the first image: zoom into a close-up top-down POV view from directly above the open black sports bag that the squatting bald man has beside him. The viewpoint is from his own POV looking down at the bag. His knees, thighs and one arm are partially visible at the edges of the frame. The bag is empty — show the empty black interior of the bag. The bag has a visible top flap with a zipper running along its top edge — the flap is folded back so the zipper is open. Parquet floor visible around the bag. Grainy black and white cinematic look.

shot17_bag_detail_filled_v2 — APPROVED, 2/3 full

shot17_bag_detail_filled_v2 — APPROVED, 2/3 full
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Edit the first image: partially fill the bag with small bite-size Soviet-era wrapped candies in full color — each candy about 2 centimeters long, thumbnail-sized, hundreds of them densely packed. The bag is about two-thirds full, NOT filled to the top — empty space visible above the candy layer. Paper wrappers with twisted ends (fantik style), printed with Soviet brand illustrations: "Мишка косолапый" (bear in pine forest), "Мишка на севере" (polar bear), "Красная Шапочка" (Little Red Riding Hood), "Белочка" (squirrel), "Кара-Кум" (camel in desert), "Морячок" (sailor boy). The candies are in full saturated color, contrasting with the black-and-white scene around them. Keep the bag, parquet floor, zipper, flap, legs at the edges, lighting, and B&W look outside the bag unchanged.

Last candy — contemplative pause

"okay, let's try to animate, in 2 passes he collects candies and tosses them in the bag"

Hold on, different task — on this frame it's not collecting but a single moment with the last candy on the palm:

"he should reach beyond the right edge of the frame for one orange candy, but not toss it in the bag — open the fist over the bag and look at this one candy on the palm"

shot17_bag_detail_lastcandy_v1 — right hand, but the bandit collects with his left
params 720p · 6s · 16:9
prompt
@Image1 as first frame. The man reaches his right hand off the right edge of the frame, then brings his closed fist back and positions it above the open bag. He slowly opens his fist — a single orange-wrapped candy is resting on his open palm. He looks down at the candy on his palm and holds still. He does NOT drop the candy into the bag.
Camera fixed, locked-off. No background music, no soundtrack. Avoid jitter, avoid bent limbs, avoid identity drift.

"fuck, wrong hand, he collects with his left"

"need to describe the candy better, or even better — give a ref. candy_ref.jpg"

"left edge of the frame — that's wrong, we're facing the bandit — his left hand is on the right side of the frame"

v2: left hand (but exits at the right edge of the frame, since he's in left profile), + candy ref:

shot17_bag_detail_lastcandy_v2 — APPROVED, left hand, candy on the palm
params 720p · 6s · 16:9
prompt
@Image1 as first frame. @Image2 shows the specific candy — the man is holding this exact candy in his palm at the end of the shot.
The man reaches his LEFT hand off the right edge of the frame, then brings his closed fist back and positions it above the open bag. He slowly opens his LEFT fist — a single candy matching @Image2 is resting on his open palm. He looks down at the candy on his palm and holds still. He does NOT drop the candy into the bag.
Camera fixed, locked-off. No background music, no soundtrack. Avoid jitter, avoid bent limbs, avoid identity drift.

Partner closeup on the couch

"now from the wide shot make a closeup on the partner, don't forget to give the partner closeup ref"

Gemini edit of the wide → medium close-up. With face-ref:

shot18_partner_closeup_v1 — pose doesn't match the wide

shot18_partner_closeup_v1 — pose doesn't match the wide
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Edit the first image to produce a medium close-up framing of the bearded man sitting on the couch — same scene, same lighting, same environment, but the camera is now tighter, framed from his chest up. His face and expression match @Image2 (use as character reference). Keep the B&W look and the atmosphere of the first image. Visible around him: part of the couch behind him, part of the wallpaper.

"wrong, pose is off. let's do it without face ref for now, in 2 stages — closeup first without ref, then we'll fix the face"

shot18_partner_closeup_v2 — pose from the wide preserved

shot18_partner_closeup_v2 — pose from the wide preserved
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Edit the first image to produce a medium close-up framing of the bearded man sitting on the couch — same scene, same lighting, same pose and expression as in the first image, but the camera is now tighter, framed from his chest up. Keep the B&W look and the atmosphere of the first image.

v3: face to the canon:

shot18_partner_closeup_v3 — face closer to canon, but not exact

shot18_partner_closeup_v3 — face closer to canon, but not exact
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Edit the first image: refine the facial features of the bearded man to exactly match the face from @Image2 — same nose, same eyes, same jawline, same beard shape. Keep everything else unchanged: same lighting, same pose, same direction of gaze, same clothing, same couch, same background, same B&W look.

shot18_partner_closeup_v4 — regen

shot18_partner_closeup_v4 — regen
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Edit the first image: refine the facial features of the bearded man to exactly match the face from @Image2 — same nose, same eyes, same jawline, same beard shape. Keep everything else unchanged: same lighting, same pose, same direction of gaze, same clothing, same couch, same background, same B&W look.

"let's also ask for him to look straight, but not directly at camera. only this isn't a new edit, it's where we change the face from v2"

shot18_partner_closeup_v5 — gaze past the camera

shot18_partner_closeup_v5 — gaze past the camera
params banana · gemini-3-pro-image-preview · google · 16:9
prompt
Edit the first image: refine the facial features of the bearded man to exactly match the face from @Image2 — same nose, same eyes, same jawline, same beard shape. His gaze is now directed straight ahead, but NOT directly into the camera — slightly to one side, looking past the camera. Keep everything else unchanged: same lighting, same pose, same clothing, same couch, same background, same B&W look.

"and why did banana start giving such low resolution?"

My screwup: removed --size 2K while fixing edit-mode, didn't put it back. Came out 1376x768 instead of 2K. Restored.

"put it back, no need to retry, we'll animate what we have. here's the audio voice_v_sumku.mp3. shot18_partner_closeup_v5_eye_corrected — fixed the gaze. here's the text: V sumku polozhil, blyat. Bystro. 6s"

"what do you think, do a camera push-in? — I think the lip-sync will be fine, let's go with the push-in"

RGBA-PNG died on JPEG save in fit_to_aspect. Fixed: convert to RGB if suffix is .jpg/.jpeg.

shot18_partner_dialogue_v1 — APPROVED, slow push-in, lip-sync
params 720p · 6s · 16:9
prompt
@Image1 as first frame. @Audio1 is the line the man speaks. He speaks in Russian at a steady even pace without long pauses between phrases, mouth moves in sync with @Audio1. The line he says: "В сумку положил, блять. Быстро.". No hand gestures. Arms stay relaxed and still.
Camera slow push-in. Preserve the black-and-white look of @Image1. No background music, no soundtrack. Avoid jitter, avoid identity drift.

Bandit profile by the window: ten attempts to straighten the wrong hand

"let's do a side angle on the bandit toward the window from the wide. don't forget 2k now"

Gemini edit of the wide to side profile:

— Gemini turned only the bandit, the camera didn't move.

"the side angle didn't come out at all. he turned only the bandit, but we need the camera"

Camera-orbit in Gemini almost never works.

"no, can't pull the profile out, lighting is wrong. here's where you can crop the bg from: m2_window_tight_v2.png"

Adding the bandit to a clean window frame, with the wide as a helper for the bag. In stages:

shot19_bandit_profile_v1 — bandit on the right, in profile, crouching, bag in front of him

shot19_bandit_profile_v1 — bandit on the right, in profile, crouching, bag in front of him
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Add a bald man in a dark leather jacket to the first image. He is positioned in the right half of the frame, to the right of the window, crouched on his haunches on the parquet floor in the foreground. He is shown in side profile, facing forward toward an open black sports bag which sits on the floor in front of him. The window is to his left, backlighting his profile with strong rim light from the side. Use @Image2 as reference for the man and the bag's appearance — same face, same leather jacket, same sports bag. Grainy B&W cinematic look. Keep the window, curtains, wallpaper, and lighting of the first image unchanged.

"really good. I cropped for accuracy shot19_bandit_profile_v1_crop. I realized the original bg I gave you has herringbone parquet running the wrong way — need to rotate the pattern 90 degrees"

shot19_bandit_profile_v1_crop — user's crop

shot19_bandit_profile_v1_crop — user's crop

v2 — rotate parquet 90°:

— "no, no changes at all". Reformulated through explicit direction (horizontallyvertically):

shot19_bandit_profile_v3 — APPROVED, parquet rotated

shot19_bandit_profile_v3 — APPROVED, parquet rotated
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Edit the first image: change the direction of the herringbone parquet floor pattern. Currently the parquet chevrons zigzag HORIZONTALLY across the frame (pointing left-right). Change them so they zigzag VERTICALLY (from the bottom of the frame toward the top, pointing up-down, toward and away from the camera). Keep the wood color, texture, and the rest of the scene (man, bag, window, curtains, wallpaper, lighting, B&W look) unchanged.

"now he should extend his left palm over the bag, and we'll place a candy on it from the ref"

And here started the wrong-hand saga that lasted 5 iterations and a mirror hack:

shot19_bandit_profile_v4 — extended right instead of left

shot19_bandit_profile_v4 — extended right instead of left
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Edit the first image: the bald man extends his LEFT hand forward, palm facing upward and open, hovering directly above the open sports bag in front of him. His open palm is empty. Keep everything else unchanged — his body, face, the bag, the window, curtains, wallpaper, parquet floor, lighting, and B&W look.

"wrong hand, extended right"

v5 from v3, explicit RIGHT hand:

shot19_bandit_profile_v5 — same hand again

shot19_bandit_profile_v5 — same hand again
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Edit the first image: the bald man extends his RIGHT hand forward, palm facing upward and open, hovering directly above the open sports bag in front of him. His open palm is empty. Keep everything else unchanged — his body, face, the bag, the window, curtains, wallpaper, parquet floor, lighting, and B&W look.

"right one came out again"

Both attempts ("left", "right") — same hand. Spatially: "the hand hidden behind the body":

shot19_bandit_profile_v6 — no output, image exceeds dimension limit

shot19_bandit_profile_v6 — no output, image exceeds dimension limit
params banana · gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Edit the first image: the bald man extends the arm that is currently hidden behind his body (the arm on the far side from the camera). That arm should now reach forward across his chest, palm facing upward and open, hovering directly above the open sports bag in front of him. The arm closer to the camera remains at his side. Keep everything else unchanged — his body position, face, bag, window, curtains, wallpaper, parquet, lighting, B&W look.

After /compact another iteration — v7 via anatomical description "near-side / foreground shoulder":

shot19_bandit_profile_v7 — wrong hand

shot19_bandit_profile_v7 — wrong hand
params banana · google/gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Edit the first image: extend the man's arm that is on the near side to the camera (the arm we can see in the foreground, closer to the viewer — anatomically his left arm since he is in left profile facing the window) straight forward, palm facing upward and open, hand hovering directly above the open sports bag in front of him. His other arm (on the far side of his body, away from camera) stays tucked behind, out of view. Keep everything else identical: body position (squatting), face, head angle, bag, window, curtains, wallpaper, parquet, lighting, black-and-white look, 16:9 framing.

"no, that's the wrong hand, this is the far one and the right one"

v8 — with phrase "anatomical LEFT arm" + hard "RIGHT arm must stay EXACTLY where it was":

shot19_bandit_profile_v8 — still wrong

shot19_bandit_profile_v8 — still wrong
params banana · google/gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Edit the first image. The man has two arms, both currently resting near the bag. Extend ONLY his anatomical LEFT arm — the arm attached to his LEFT shoulder, which is the shoulder on the camera side of his body (the near-side, foreground shoulder, closer to the viewer). Bring that left hand forward and up, palm rotated to face upward, fingers open, the hand hovering directly above the open sports bag. His RIGHT arm (far-side, on the side of his body away from the camera) must stay EXACTLY where it was in the input — do not move it, do not re-pose it, do not extend it. Keep head, face, torso, bag, window, curtains, wallpaper, parquet, lighting, B&W look, and 16:9 framing identical.

Hack: flip v3 → edit → flip back. If Gemini always grabs the right hand in image-space, then after the mirror it'll be the left:

shot19_bandit_profile_v3_flip — mirrored version of v3

shot19_bandit_profile_v3_flip — mirrored version of v3

shot19_bandit_profile_v9_flip — edit on the mirror

shot19_bandit_profile_v9_flip — edit on the mirror
params banana · google/gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Edit the first image. The man has two arms, both currently resting near the bag. Extend ONLY his anatomical LEFT arm — the arm attached to his LEFT shoulder, which is the shoulder on the camera side of his body (the near-side, foreground shoulder, closer to the viewer in depth). Bring that left hand forward and up, palm rotated to face upward, fingers open, the hand hovering directly above the open sports bag. His RIGHT arm (far-side, on the side of his body away from the camera) must stay EXACTLY where it was in the input — do not move it, do not re-pose it, do not extend it. Keep head, face, torso, bag, window, curtains, wallpaper, parquet, lighting, B&W look, and 16:9 framing identical.

shot19_bandit_profile_v9 — flipped back, same hand

shot19_bandit_profile_v9 — flipped back, same hand

"didn't help. maybe a top-down view where the right hand is already extended would help? hand_ref.jpg"

v10 with hand_ref as second ref:

shot19_bandit_profile_v10 — still wrong

shot19_bandit_profile_v10 — still wrong
params banana · google/gemini-3-pro-image-preview · google · 2K · 16:9
prompt
Take the first image as the base to edit. The second image is a top-down view of the SAME scene showing the target arm pose. In the second image, one arm is extended forward with palm facing upward, hand hovering above the open bag — this is the arm you need to extend in the first image. Apply that arm extension to the first image (which is the side-profile view of the same man in the same pose): extend the SAME anatomical arm from his torso forward, palm rotated upward, fingers open, hand hovering above the open sports bag. Do not remove the candy shown in the second image's palm — actually ignore the candy entirely, just use it to identify which arm is extended. The other arm stays in its original position from the first image. Output must be the first image edited — same B&W side-profile framing, same window, curtains, wallpaper, parquet, bag, body pose, head, face, lighting, and 16:9 aspect ratio. NOT the top-down view.

"nope, interesting, what's it stuck on. okay let's try to animate v3 directly. let him extend the left hand with the candy (give a hand ref), shut his eyes, drop the candy in the bag and zip it, 6s"

Key discovery: Seedance with hand_ref as second ref immediately took the right hand — the animation model listens to pose-ref, the edit model doesn't:

shot19_anim_v1 — APPROVED, left hand worked
params 720p · 6s · 16:9
prompt
@Image1 as first frame. @Image2 shows the target arm pose — a hand with palm facing upward holding a single wrapped candy, hovering above the open bag. Use this as reference for which arm to extend and how it should look. 00:00-00:02: The bald man in leather jacket extends his near-side arm (left arm, closer to camera) forward, palm rotating upward, fingers opening, revealing a single Soviet-style wrapped candy resting on his palm, hand hovering directly above the open sports bag. 00:02-00:04: He briefly closes his eyes (squints them shut for a moment), then tilts his palm downward so the candy drops into the bag. 00:04-00:06: He reaches forward with both hands and zips the bag closed across the top. Camera fixed, locked-off, no movement. Preserve the black-and-white look and side-profile composition of @Image1 exactly. No background music, no soundtrack.

"yes, worked. but I want one more take and queue generations as tasks instead of hanging"

shot19_anim_v2 — second take
params 720p · 6s · 16:9
prompt
@Image1 as first frame. @Image2 shows the target arm pose — a hand with palm facing upward holding a single wrapped candy, hovering above the open bag. Use this as reference for which arm to extend and how it should look. 00:00-00:02: The bald man in leather jacket extends his near-side arm (left arm, closer to camera) forward, palm rotating upward, fingers opening, revealing a single Soviet-style wrapped candy resting on his palm, hand hovering directly above the open sports bag. 00:02-00:04: He briefly closes his eyes (squints them shut for a moment), then tilts his palm downward so the candy drops into the bag. 00:04-00:06: He reaches forward with both hands and zips the bag closed across the top. Camera fixed, locked-off, no movement. Preserve the black-and-white look and side-profile composition of @Image1 exactly. No background music, no soundtrack.

"what the fuck, in shot 19 you didn't specify it should be a single shot? did you forget the guide?"

"read it WHOLE, idiot, you have to keep the guide whole always"

From section 9: No scene cuts throughout, one continuous shot. Read the whole guide. Forgot it on shot20 too (already in progress).

shot19_anim_v3 — with the single-shot directive
params 720p · 6s · 16:9
prompt
@Image1 as first frame. @Image2 shows the target arm pose — a hand with palm facing upward holding a single wrapped candy, hovering above the open bag. Use this as reference for which arm to extend and how it should look. 00:00-00:02: The bald man in leather jacket extends his near-side arm (left arm, closer to camera) forward, palm rotating upward, fingers opening, revealing a single Soviet-style wrapped candy resting on his palm, hand hovering directly above the open sports bag. 00:02-00:04: He briefly closes his eyes (squints them shut for a moment), then tilts his palm downward so the candy drops into the bag. 00:04-00:06: He reaches forward with both hands and zips the bag closed across the top. Camera fixed, locked-off, no movement. Preserve the black-and-white look and side-profile composition of @Image1 exactly. Maintain consistent character appearance and lighting from @Image1 throughout. No background music, no soundtrack. No scene cuts throughout, one continuous shot. Avoid jitter and bent limbs.

Bag zipper

Wide of shot17_bag_detail_filled_v2 + actions on top: zip + leave.

shot20_bag_close_detail_v1 — zipped crooked, zipper on the left
params 720p · 5s · 16:9
prompt
@Image1 as first frame. Top-down overhead view of an open sports bag full of colorful Soviet-era wrapped candies, with the bandit squatting behind the bag. 00:00-00:02: Two hands come down onto the bag and zip the zipper across the top, fully closing the bag. The colorful candies disappear under the leather flap as the bag shuts. 00:02-00:05: The bandit grips the bag handles with both hands, lifts the closed bag straight up out of frame, and stands, his legs moving out of the frame at the top. The parquet floor is left empty where the bag was. Camera fixed, locked-off overhead, no movement. Preserve the black-and-white look. No background music, no soundtrack.

"zipped crooked, zipper on the left of the frame"

"left to right across the frame, but it's not enough to just pull the zipper, you have to throw the flap fabric over first"

v2: flap first, then zipper left-to-right, then stand-up:

shot20_bag_close_detail_v2 — flap + straight zipper + stand-up
params 720p · 5s · 16:9
prompt
@Image1 as first frame. Top-down overhead view of an open sports bag full of colorful Soviet-era wrapped candies, with the bandit squatting behind the bag. 00:00-00:02: The bandit's hands grab the loose leather flap/cover of the bag opening and pull it across the top, folding it over the candies so the candies are hidden under the leather. 00:02-00:04: One hand grabs the zipper pull on the LEFT side of the bag's top edge and drags it smoothly from left to right across the top, fully closing the zipper. The zipper pull finishes at the RIGHT side. 00:04-00:05: Both hands grip the bag handles, lift the closed bag straight up and out of the frame at the top. Parquet floor left empty. Camera fixed, locked-off overhead, no movement. Preserve the black-and-white look. Maintain consistent lighting and bag details from @Image1 throughout. No background music, no soundtrack. No scene cuts throughout, one continuous shot. Avoid jitter and bent limbs.

"zipper goes down first, then right, then up — a U shape, you get it? that's how these bags work"

v3: U-path along the zipper perimeter (down → right → up). Removed the stand-up, 5s is enough:

shot20_bag_close_detail_v3 — APPROVED, U-shaped zipper path
params seedance · 720p · 5s · 16:9
prompt
@Image1 as first frame. Top-down overhead view of an open sports bag full of colorful Soviet-era wrapped candies, with the bandit squatting behind the bag. 00:00-00:02: The bandit's hands grab the loose leather flap/cover of the bag opening and pull it across the top, folding it over the candies so they disappear under the leather. 00:02-00:05: One hand grabs the zipper pull at the TOP-LEFT of the bag's oval opening and drags it smoothly along the edge of the opening in a U-shaped path — first DOWN along the left side of the opening (toward the bottom of the frame), then RIGHT along the bottom edge of the opening, then UP along the right side, finishing at the TOP-RIGHT of the opening. The bag is now fully closed around the entire perimeter. Camera fixed, locked-off overhead, no movement. Preserve the black-and-white look. Maintain consistent lighting and bag details from @Image1 throughout. No background music, no soundtrack. No scene cuts throughout, one continuous shot. Avoid jitter and bent limbs.

After SSL-timeout on poll — pulled it via direct query. First time tried config at .config.json in root — 401 unauthorized, correct path is scripts/test_mode/.config.json.

Day 8 wrap-up

Approved shots:
- shot14_partner_dialogue_v8 — line "tebya, suka..." (7s, steady tempo, no gestures)
- shot15_davay_v3 — "Davay, sobiraysya, blyat" + wrist-grab (6s)
- shot15_davay_standup_v2 — alternate stand-up variant (6s)
- shot13_bandit_recovers_v1 — bandit with eyes open looking at the partner (4s)
- shot16_room_crop_full_seedream_v1 — wide with face-swap via Seedream
- shot16_collecting_v2 — candy collection, 4 cycles, 12s
- shot17_bag_detail_filled_v2 — POV bag detail with Soviet wrappers
- shot17_bag_detail_lastcandy_v2 — one candy on the palm (contemplative moment)
- shot18_partner_dialogue_v1 — "V sumku polozhil, blyat. Bystro" with push-in
- shot19_bandit_profile_v3 — bandit by the window with parquet rotated
- shot19_anim_v1 — bandit places the last candy and zips up
- shot20_bag_close_detail_v3 — POV detail of the U-route zipper

Lessons:
- generate_audio=True in Seedance = the model synthesizes its own speech from text + voice style, ignoring your audio literal. For clean lip-sync against uploaded audio it's a problem, but workable. Text in quotes + clear punctuation + pauses-as-friends + steady even pace → works.
- One edit per Gemini generation. Multiple tasks at once (size + position + color) — model ignores them.
- Gemini with 2+ refs often won't enter edit-mode, it composes a new image instead. Especially if ref aspects differ (16:9 scene + 0.71 portrait → output portrait). Seedream is more stable for face-swap.
- Gemini IMAGE_OTHER flaky on scenes with bodies/flesh/people in odd poses — the filter fires unpredictably, retry doesn't help, you need rephrasing or a model swap.
- Direction anchors in profile scenes don't work: "left/right arm", "near/far side", flip-hack — Gemini still extends the same side. But in Seedance the animation model listens to pose-ref (hand_ref) — animation did what edit couldn't.
- Read the Seedance guide whole, not by grep. Key directives (No scene cuts throughout, one continuous shot., avoid jitter and bent limbs) are scattered across sections — miss one, get a cut-edit or jittery hands.
- Evolink pending cap — 40 minutes. Then Task timed out, credits not charged. Can be cancelled via DELETE endpoint.
- RGBA PNG → JPEG save crashes in Pillow. fit_to_aspect now converts to RGB on JPEG suffix.
- --size 2K — don't forget. Easy to lose while debugging edit-mode. Without it images are 1376x768 (~1K), barely tolerable.