A long night — 17 hours with breaks, 61 generations. Reshot the partner's dialogue, added handshake+stand-up, candy collection, POV bag detail, partner closeup, bandit profile by the window, finale with the bag zipper. Separate pain — Gemini pulled the wrong hand 10 attempts in a row.
naHAnyali"okay come on google, Seedance just went and ruined half the speech"
In scene 10 shot14_partner_dialogue_v1 (front + tebya_suka.mp3, 7s) lip-sync was bad. Seedance guide: audio sweet spot 3-5s, emotional speech gives 100-200ms lip lag, fast speech with eliding — smeared closures.
"let's take a side angle (not front), and I made a new voice, just the first part. voice_tebya_suka.mp3, 6s, starts speaking after the first second, not right away"
"garbage, the speech is jumbled. shouldn't we feed the model the line text too? what does the guide say?"
Found it in the guide: text in double quotes + language. Format Character speaks in Russian: "...".
"he said tebya naHAnyali tovar prinesti, why ((( and I can't even substitute the original audio because the rhythm is totally different"
Opened the seedance_video.py code: generate_audio: True is hardcoded. With it, Seedance treats your uploaded audio only as voice style, and synthesizes the speech itself from the text — that's where "naHAnyali" and the different rhythm come from.
"do it"
Switched to generate_audio=False when audio_urls present:
"the file has no sound now"
generate_audio=False = no audio track in output. Plus lip-sync still off.
"no, it doesn't match, the lip movement isn't natural. roughly like before, only now I don't hear gibberish because there's no audio"
"no, first remove the changes that disable audio gen, that's getting in the way of me figuring out which phonemes it didn't catch"
Rolled back to generate_audio=True:
"okay better, it gives a strange accent. voice_tebya_suka_v2.mp3 — I made a version of the audio with cleaner separation, to make it easier for it"
New audio + corrected punctuation ("Тебя, сука. Наняли товар принести. А не пробовать его. Дебил."):
After a credit 402 and top-up:
"damn, it doesn't have time to say it, need 7s"
"shit, it puts really strong pauses between lines and again the last line is right at the timing edge, that's bad for editing. need to ask it to keep tempo and no gestures"
v8: steady even pace without long pauses + Finishes speaking by 00:05 + No hand gestures:
"victory, you did good, and I a bit too"
The key element — the user's v2 audio with clear separations. Everything else was prompt-tuning around it.
"the last line needs to be said on this frame, shot12_bg_user_front_fitted"
Long discussion of line variants (Davay. Sobiraysya. / + Vsyo, poshli) and action (offer hand / stand up and walk). An idea came in: the hand is offered vertically, like in arm-wrestling / wrist-grip — not palm-to-palm.
v2: starts speaking immediately at 00:00, bandit's hand from the right edge:
"no, in both versions he gives his hand like a shake, like 180 degrees, but it has to be 90, that's why I asked how to describe the grip correctly"
"and did you explicitly say the second guy is lying?"
Good catch — didn't write that. Added context of the lying body + WRIST-grab not palm-to-palm, forearms vertical:
In parallel a second variant — partner just gets up and leaves:
"speech problem, he said 'davay' twice"
Cause more likely — a fat pause after the first "Davay." (period = pause command, in the pause the model fills with a word repeat). Removed the timing crutches:
"@/$E/shot12_slap_setup_eyes_open.jpg, here's another frame of the lying bald guy, only I opened his eyes. make him just look at the partner and come around"
4s instead of 5s. After 402 and top-up:
Commit 82559c4 — 72 approved + archive files.
"the candy collecting frame, and I'm thinking which one to use"
Discussed the script for the last 6 shots. Decided: bandit collects alone, partner watches from the couch. The scene is static, no more action.
The user gave shot16_room_crop_partner.jpg — partner on the couch, body on top, pile of candy. Asked to add the bandit via Gemini:
"what kind of bald bearded guy is that?"
My screwup — confused who's who. Fixed: bandit is bald in a leather jacket, no beard.
Three Gemini attempts with char_bandit_front_bw.jpg — all IMAGE_OTHER (safety):
- attempt 1 — apparently the body on top triggers it
- attempt 2 — removed the body, but a dark fragment of flesh stayed in the candy pile
- attempt 3 — removed the fragment too — still IMAGE_OTHER
"no, try a different ref: shot5_bandit_closeup_v5.png"
Also IMAGE_OTHER. Fourth attempt — with char_bandit_back_bw.jpg (back view):
— passed, but Gemini returned the second ref instead of the edited scene. Hard-wrote "output MUST be first image edited — NOT the second":
— Gemini squeezed it into vertical aspect and recomposed everything (window/cabinet/door gone).
"you're nuts, literally yesterday everything worked fine. and in my chat it still works fine. find the error in your prompt or code"
Switched source → shot16_room_crop_wrong_bandit.jpg (user assembled in Photoshop himself):

Task — face swap. facefusion headless-run with shot5_bandit_closeup_v5.png:
"stuck the face on badly, missed"
Two passes of facefusion (different sources + reference-mode) — keeps missing. Switched to Gemini for face-swap — IMAGE_OTHER again, then squeezed into portrait. Tried:
- Removed --size 2K and --aspect — came out portrait
- Put images BEFORE text — portrait
- Tried gemini-2.5-flash-image — IMAGE_OTHER
- gemini-3.1-flash-image-preview — timeout
"what does 'doesn't go into edit mode' mean? what does that mean? are you a moron? can you talk normally. I don't know what you've been doing"
Apologized like a human. Root cause — ref aspects: 16:9 scene + face_bw portrait (0.71) → Gemini reconciled to portrait. Seven attempts wasted.
"got it, let's swap with Seedream. here's the face ref shot5_bandit_closeup_v5.png"
Seedream with 2 refs worked first try — 16:9 preserved, face swapped, partner untouched, composition intact:

With 3 refs (scene + bandit face + partner face):
First run pending on Evolink for 40 minutes (high load). Cancelled via DELETE endpoint, restarted — came in.
"needs more length, let's make it 12s and have him collect everything in 4 passes"
Gemini on the wide shot with top-down POV:

v2 with --aspect 16:9:

"needs Soviet ones — describe which, no hand needed (did you add a hand?)"
Yes, the hand was in the prompt. Removed. Described by brands: Mishka kosolapy, Mishka na severe, Krasnaya Shapochka, Belochka, Kara-Kum, Moryachok, Lastochka.

"only the bag flap is on top, but it should be on the bottom. and the candies are giants, like 2x larger than they should be"
v4 — two tasks at once:

"we're doing a really complex thing here, I usually do this kind of thing in stages"
Lesson for Gemini: one edit per generation. Split the task.
v5: shrink candies on v3:

"smaller still"
v6 — matchbox size, thumbnail-sized:

"I'd start with an empty bag from v3 — like remove the candies"
Decoupling: empty bag first, then fill:


"see, and you were scared"
While shot16 was cooking in the background — did a second iteration of the detail based on the wide (so the bandit's legs at the edges are visible):


"okay, let's try to animate, in 2 passes he collects candies and tosses them in the bag"
Hold on, different task — on this frame it's not collecting but a single moment with the last candy on the palm:
"he should reach beyond the right edge of the frame for one orange candy, but not toss it in the bag — open the fist over the bag and look at this one candy on the palm"
"fuck, wrong hand, he collects with his left"
"need to describe the candy better, or even better — give a ref. candy_ref.jpg"
"left edge of the frame — that's wrong, we're facing the bandit — his left hand is on the right side of the frame"
v2: left hand (but exits at the right edge of the frame, since he's in left profile), + candy ref:
"now from the wide shot make a closeup on the partner, don't forget to give the partner closeup ref"
Gemini edit of the wide → medium close-up. With face-ref:

"wrong, pose is off. let's do it without face ref for now, in 2 stages — closeup first without ref, then we'll fix the face"

v3: face to the canon:


"let's also ask for him to look straight, but not directly at camera. only this isn't a new edit, it's where we change the face from v2"

"and why did banana start giving such low resolution?"
My screwup: removed --size 2K while fixing edit-mode, didn't put it back. Came out 1376x768 instead of 2K. Restored.
"put it back, no need to retry, we'll animate what we have. here's the audio voice_v_sumku.mp3. shot18_partner_closeup_v5_eye_corrected — fixed the gaze. here's the text: V sumku polozhil, blyat. Bystro. 6s"
"what do you think, do a camera push-in? — I think the lip-sync will be fine, let's go with the push-in"
RGBA-PNG died on JPEG save in fit_to_aspect. Fixed: convert to RGB if suffix is .jpg/.jpeg.
"let's do a side angle on the bandit toward the window from the wide. don't forget 2k now"
Gemini edit of the wide to side profile:
— Gemini turned only the bandit, the camera didn't move.
"the side angle didn't come out at all. he turned only the bandit, but we need the camera"
Camera-orbit in Gemini almost never works.
"no, can't pull the profile out, lighting is wrong. here's where you can crop the bg from: m2_window_tight_v2.png"
Adding the bandit to a clean window frame, with the wide as a helper for the bag. In stages:

"really good. I cropped for accuracy shot19_bandit_profile_v1_crop. I realized the original bg I gave you has herringbone parquet running the wrong way — need to rotate the pattern 90 degrees"

v2 — rotate parquet 90°:
— "no, no changes at all". Reformulated through explicit direction (horizontally → vertically):

"now he should extend his left palm over the bag, and we'll place a candy on it from the ref"
And here started the wrong-hand saga that lasted 5 iterations and a mirror hack:

"wrong hand, extended right"
v5 from v3, explicit RIGHT hand:

"right one came out again"
Both attempts ("left", "right") — same hand. Spatially: "the hand hidden behind the body":

After /compact another iteration — v7 via anatomical description "near-side / foreground shoulder":

"no, that's the wrong hand, this is the far one and the right one"
v8 — with phrase "anatomical LEFT arm" + hard "RIGHT arm must stay EXACTLY where it was":

Hack: flip v3 → edit → flip back. If Gemini always grabs the right hand in image-space, then after the mirror it'll be the left:



"didn't help. maybe a top-down view where the right hand is already extended would help? hand_ref.jpg"
v10 with hand_ref as second ref:

"nope, interesting, what's it stuck on. okay let's try to animate v3 directly. let him extend the left hand with the candy (give a hand ref), shut his eyes, drop the candy in the bag and zip it, 6s"
Key discovery: Seedance with hand_ref as second ref immediately took the right hand — the animation model listens to pose-ref, the edit model doesn't:
"yes, worked. but I want one more take and queue generations as tasks instead of hanging"
"what the fuck, in shot 19 you didn't specify it should be a single shot? did you forget the guide?"
"read it WHOLE, idiot, you have to keep the guide whole always"
From section 9: No scene cuts throughout, one continuous shot. Read the whole guide. Forgot it on shot20 too (already in progress).
Wide of shot17_bag_detail_filled_v2 + actions on top: zip + leave.
"zipped crooked, zipper on the left of the frame"
"left to right across the frame, but it's not enough to just pull the zipper, you have to throw the flap fabric over first"
v2: flap first, then zipper left-to-right, then stand-up:
"zipper goes down first, then right, then up — a U shape, you get it? that's how these bags work"
v3: U-path along the zipper perimeter (down → right → up). Removed the stand-up, 5s is enough:
After SSL-timeout on poll — pulled it via direct query. First time tried config at .config.json in root — 401 unauthorized, correct path is scripts/test_mode/.config.json.
Approved shots:
- shot14_partner_dialogue_v8 — line "tebya, suka..." (7s, steady tempo, no gestures)
- shot15_davay_v3 — "Davay, sobiraysya, blyat" + wrist-grab (6s)
- shot15_davay_standup_v2 — alternate stand-up variant (6s)
- shot13_bandit_recovers_v1 — bandit with eyes open looking at the partner (4s)
- shot16_room_crop_full_seedream_v1 — wide with face-swap via Seedream
- shot16_collecting_v2 — candy collection, 4 cycles, 12s
- shot17_bag_detail_filled_v2 — POV bag detail with Soviet wrappers
- shot17_bag_detail_lastcandy_v2 — one candy on the palm (contemplative moment)
- shot18_partner_dialogue_v1 — "V sumku polozhil, blyat. Bystro" with push-in
- shot19_bandit_profile_v3 — bandit by the window with parquet rotated
- shot19_anim_v1 — bandit places the last candy and zips up
- shot20_bag_close_detail_v3 — POV detail of the U-route zipper
Lessons:
- generate_audio=True in Seedance = the model synthesizes its own speech from text + voice style, ignoring your audio literal. For clean lip-sync against uploaded audio it's a problem, but workable. Text in quotes + clear punctuation + pauses-as-friends + steady even pace → works.
- One edit per Gemini generation. Multiple tasks at once (size + position + color) — model ignores them.
- Gemini with 2+ refs often won't enter edit-mode, it composes a new image instead. Especially if ref aspects differ (16:9 scene + 0.71 portrait → output portrait). Seedream is more stable for face-swap.
- Gemini IMAGE_OTHER flaky on scenes with bodies/flesh/people in odd poses — the filter fires unpredictably, retry doesn't help, you need rephrasing or a model swap.
- Direction anchors in profile scenes don't work: "left/right arm", "near/far side", flip-hack — Gemini still extends the same side. But in Seedance the animation model listens to pose-ref (hand_ref) — animation did what edit couldn't.
- Read the Seedance guide whole, not by grep. Key directives (No scene cuts throughout, one continuous shot., avoid jitter and bent limbs) are scattered across sections — miss one, get a cut-edit or jittery hands.
- Evolink pending cap — 40 minutes. Then Task timed out, credits not charged. Can be cancelled via DELETE endpoint.
- RGBA PNG → JPEG save crashes in Pillow. fit_to_aspect now converts to RGB on JPEG suffix.
- --size 2K — don't forget. Easy to lose while debugging edit-mode. Without it images are 1376x768 (~1K), barely tolerable.