Day 1. April 15, 06:31–11:46. "We need a picture of the hideout"

Early start: the screenplay text and the first step

The conversation kicked off at 06:31 — the agent suggested laying out the shots in a grid, separating the recurring visual anchors of the location from the unique ones. The user didn't quite get the questions and at 06:32 sent over пиньята.txt:

"Hold on, I don't really get your questions. Do you have the full Piñata text? If not — here it is. What's the first step?"

The encoding was botched but the agent figured it out. The screenplay split into three physical spaces: hideout-BW (scenes 1, 2 before the corpse rips open, and 4 after the slap), hideout-color (scene 2 after the rip, before the slap), and village-trip (scene 3). Four scenes in the script.

Proposal: forget shots and action, start with one task — a picture of the HIDEOUT, the visual anchor for every shot in it.

The user clarified:

"So we start with the master shot? Like a wide angle? And what if I need a reverse angle — and how do I edit a master shot?"

The agent explained: a master shot is a visual identity anchor, not a wide establishing. It locks the wallpaper color, the lighting character, the key objects, the texture grit. Then the question: is the corpse part of the location or a separate entity? Answer — a Character. The corpse has states: hanging → swinging → torn open → remains; visual identity (clothes, age, decomposition) must stay consistent across scenes 1, 2, and 4.

The local Flux quest on a 7900 XTX

The user asked about local Flux. He has a 7900 XTX, 24 GB VRAM — the hardware can run Flux.1 Kontext dev in FP16. But old Windows, no ROCm.

"ROCm isn't there and won't be, old Windows."

The agent ran through the options: ZLUDA (CUDA emulation, lottery), DirectML (slow). The user came back with a key argument:

"Why does llama.cpp work through Vulkan, what — you can't do images too?"

Turns out — you can. stable-diffusion.cpp from the same author (Gerganov), Vulkan backend without ROCm, support for Flux.1 Kontext dev since late 2024, GGUF Q4/Q8. Four days ago (April 11) they baked an embedded web UI into the binary (PR #1408).

Concrete links: binary sd-master-fd35047-bin-win-vulkan-x64.zip (23 MB) + weights flux1-kontext-dev-Q8_0.gguf (12.7 GB) + ae.safetensors + clip_l + t5-v1_1-xxl-encoder-Q8_0.gguf. Total ~20 GB.

At 07:31 the user reported: everything downloaded into c:\Users\boomyjee\Downloads\sdcpp\. The agent rewrote start.bat, created server.bat. First run — OOM on VAE encode: ref.png was 2848×1600, allocating a 3.5 GB buffer with only 6.7 GB free out of 24 GB. Fixed memory options.

It started. The numbers:
- Loading 4 models: 5 sec (~17 GB into VRAM)
- VAE encode ref: 7 sec
- CLIP+T5 encode prompt: 2 sec
- Sampling 20 steps: 180 sec (9 sec/step) — expected 2–3 sec/step
- VAE decode: 12 sec
- Total: 3:20 per generation

Second attempt gave the same speed. Flash Attention flags for the Vulkan backend silently no-op — not implemented. Verdict: locally on AMD via Vulkan Flux technically works, but 200 sec per edit is unacceptable for iteration.

Flux.2 via OpenRouter

The user asked about OpenRouter. They have FLUX.2 there (November 2025, newer than Kontext v1):
- flux.2-pro — $0.015/MP input + $0.03/MP output → ~4.5¢ per edit on 1 MP
- flux.2-flex — $0.06/MP → 12¢, but multi-reference editing
- flux.2-max — top tier
- flux.2-klein — cheap

"Let's just keep moving with one-off scripts, we're testing the concept. Generate a master plan and we'll look at it."

At 08:02 the first flux_gen.py was written, M1 generation kicked off — master shot of the hideout, B&W, view from the door. First run died with HTTP 400 "Request Moderated" — Flux hard-censors "noose/corpse/criminal". Rewrote the prompt without triggers: "abandoned old Soviet apartment, hanging lightbulb, crumpled newspapers, dust motes" — passed.

First image m1_priton.png — 12 seconds, 3 cents:

m1_priton

m1_priton
params flux · flux.2-pro · 16:9
prompt
Derelict run-down Soviet apartment interior, black and white cinematic photograph, high contrast, grainy 35mm film stock. View from the doorway into a small dim bedroom: yellowed peeling wallpaper with faded floral pattern, empty dangling rope noose hanging from a ceiling hook in the center of the room, dusty wooden parquet floor, a single bare naked bulb hanging on a wire from the ceiling, an old Soviet chest of drawers pushed against the left wall with several drawers pulled open, crumpled newspapers and clothing scattered across the floor, heavy grimy curtains on a single small window at the back of the room. Cold atmosphere of abandonment and dread. Petersburg criminal underworld, late 1990s aesthetic. Eye-level wide shot from the doorway, room receding into shadow. Sharp deep focus, cinematic still, dust in the air. Empty room, no people, no corpse.

Nice atmosphere, but in the user's view — too cramped:

"From my angle it's tight in here, we need space and depth."

Master shot iterations: v2 → v7_max

Reworked for cinemascope 2.39:1, doorways in the background:

m1_priton_v2

m1_priton_v2
params flux · flux.2-pro · 16:9
prompt
Wide cinematic anamorphic 2.39:1 ultrawide black and white photograph, abandoned spacious old apartment interior, two interconnected dilapidated rooms visible through open doorways. Large tall window on the left wall flooding the space with bright natural daylight, dust particles dancing in the light beam. Wide diagonal wooden floorboards covered in debris, crumpled newspapers, scattered trash, broken plaster. Peeling stained wallpaper on the walls, water-damaged crumbling ceiling with exposed wooden beams. An old shabby fabric couch against the back wall, low wooden coffee table tipped over. Two open doorways on the right side leading deeper into more empty rooms, creating layers of depth. High ceiling. Late afternoon dim daylight. Deep perspective, vanishing point composition, wide angle lens. Empty space, no people, no figures. Eastern European post-Soviet decay, late 1990s. Grainy 35mm film stock, deep blacks, high contrast. Wes Anderson symmetrical framing meets Tarkovsky melancholy.

The user cut in:

"Why the hell so many doors and why is the picture square? And why is the resolution so low?"

Flux's docs read worse than they could: turns out image_config: {aspect_ratio: "21:9"} goes in the payload, not in words in the prompt. Same for image_size. Fixed flux_gen.py--aspect 21:9 --size 2048:

m1_priton_v3

m1_priton_v3
params flux · flux.2-pro · 2048 · 21:9
prompt
Wide cinematic black and white photograph, abandoned spacious old apartment, single large room. Tall tall window on the left wall flooding the space with bright shafts of natural daylight, dust particles dancing in the light beam casting visible diagonal rays across the floor. Wide diagonal wooden floorboards covered in scattered debris, crumpled newspapers and trash. Peeling stained floral wallpaper on the back wall, water-damaged crumbling ceiling with exposed dark wooden beams. An old shabby fabric couch against the back wall under the window, low wooden coffee table in front of it. High ceiling, deep open empty room. Late afternoon dim daylight. Ultra-wide deep perspective. Empty space, no people, no figures. Eastern European post-Soviet decay, late 1990s aesthetic. Grainy 35mm film stock, deep blacks, high contrast cinematic still.

"Too much wreckage, nowhere to look for the goods. And we need just one door."

v4 stripped the wreckage and the unnecessary doorways, kept one door on the right, added a wardrobe, drawers, table, chair — "something to search":

m1_priton_v4

m1_priton_v4
params flux · flux.2-pro · 2048 · 21:9
prompt
Wide cinematic black and white photograph, dingy old Soviet apartment interior, single spacious living room. Tall window on the left wall with worn lace curtains pushed aside, bright shafts of natural daylight pouring in casting visible diagonal sunbeams across the wooden floor, dust particles dancing in the light. Old creaky wooden parquet floor, lightly dusty but mostly clean. Faded yellowed floral wallpaper on the walls, tarnished but still intact. Furniture along the walls: a tall dark wooden wardrobe with closed doors on the right wall, a chest of drawers next to it, a worn shabby fabric couch against the back wall under a small tarnished mirror, a small writing desk with a single drawer in the corner, a low coffee table. A single closed wooden door on the right wall (the entrance). High ceiling, intact plaster. Late afternoon light. Wide deep perspective. Empty space, no people, no figures. Lived-in but undisturbed, melancholic atmosphere. Eastern European late 1990s aesthetic. Grainy 35mm film stock, deep blacks, high contrast cinematic still.

"The ceilings look low to me."

v5 added "4-meter ceilings, crown molding, Stalinka-era apartment, herringbone parquet, floor-to-ceiling window":

m1_priton_v5

m1_priton_v5
params flux · flux.2-pro · 2048 · 21:9
prompt
Wide cinematic black and white photograph, dingy old pre-revolutionary Stalinka apartment interior, single spacious living room with VERY TALL HIGH CEILINGS (4 meters tall), grand pre-war architecture. Tall narrow window on the left wall, full-height from floor almost to ceiling, with worn lace curtains pushed aside, bright shafts of natural daylight pouring in casting visible diagonal sunbeams across the wooden floor, dust particles dancing in the light. Old creaky herringbone parquet floor, lightly dusty. Faded yellowed floral wallpaper on the walls, tarnished but still intact. Decorative crown molding around the high ceiling. Furniture along the walls: a tall dark wooden wardrobe with closed doors on the right wall, a chest of drawers next to it, a worn shabby fabric couch against the back wall, a small writing desk with a single drawer in the corner, a low coffee table. A single closed wooden door on the right wall (the entrance). Late afternoon light. Wide deep perspective, low camera angle looking slightly up to emphasize the height of the ceiling. Empty space, no people, no figures. Lived-in but undisturbed, melancholic atmosphere. Eastern European late 1990s aesthetic. Grainy 35mm film stock, deep blacks, high contrast cinematic still.

Now it's a Stalinka. Size though, 1920×816 ≈ 1.57 MP — same result with --size 3072:

m1_priton_v6_3072 — --size 3072, still 1.57 MP

m1_priton_v6_3072 — --size 3072, still 1.57 MP
params flux · flux.2-pro · 3072 · 21:9
prompt
Wide cinematic black and white photograph, dingy old pre-revolutionary Stalinka apartment interior, single spacious living room with VERY TALL HIGH CEILINGS (4 meters tall), grand pre-war architecture. Tall narrow window on the left wall, full-height from floor almost to ceiling, with worn lace curtains pushed aside, bright shafts of natural daylight pouring in casting visible diagonal sunbeams across the wooden floor, dust particles dancing in the light. Old creaky herringbone parquet floor, lightly dusty. Faded yellowed floral wallpaper on the walls, tarnished but still intact. Decorative crown molding around the high ceiling. Furniture along the walls: a tall dark wooden wardrobe with closed doors on the right wall, a chest of drawers next to it, a worn shabby fabric couch against the back wall, a small writing desk with a single drawer in the corner, a low coffee table. A single closed wooden door on the right wall (the entrance). Late afternoon light. Wide deep perspective, low camera angle looking slightly up to emphasize the height of the ceiling. Empty space, no people, no figures. Lived-in but undisturbed, melancholic atmosphere. Eastern European late 1990s aesthetic. Grainy 35mm film stock, deep blacks, high contrast cinematic still.

OpenRouter caps 21:9 output to ~2 MP across both models. Tried the top-tier flux.2-max — $0.10, 42 sec, got m1_priton_v7_max.png: 2048×880, 1.80 MP. Slightly sharper and more contrasty. Still not 4K, but it's the canonical M1:

m1_priton_v7_max — approved master shot

m1_priton_v7_max — approved master shot
params flux · flux.2-max · 3072 · 21:9
prompt
Wide cinematic black and white photograph, dingy old pre-revolutionary Stalinka apartment interior, single spacious living room with VERY TALL HIGH CEILINGS (4 meters tall), grand pre-war architecture. Tall narrow window on the left wall, full-height from floor almost to ceiling, with worn lace curtains pushed aside, bright shafts of natural daylight pouring in casting visible diagonal sunbeams across the wooden floor, dust particles dancing in the light. Old creaky herringbone parquet floor, lightly dusty. Faded yellowed floral wallpaper on the walls, tarnished but still intact. Decorative crown molding around the high ceiling. Furniture along the walls: a tall dark wooden wardrobe with closed doors on the right wall, a chest of drawers next to it, a worn shabby fabric couch against the back wall, a small writing desk with a single drawer in the corner, a low coffee table. A single closed wooden door on the right wall (the entrance). Late afternoon light. Wide deep perspective, low camera angle looking slightly up to emphasize the height of the ceiling. Empty space, no people, no figures. Lived-in but undisturbed, melancholic atmosphere. Eastern European late 1990s aesthetic. Grainy 35mm film stock, deep blacks, high contrast cinematic still.

"Let's start a decisions file. Flux-max is fine, we keep it. What's next?"

Created docs/movie_pipeline.md — to lock decisions and open questions.

The critical test: does multi-ref edit hold the location

The agent formulated the main untested hypothesis: multi-ref edit preserves location identity. If it doesn't, the whole "M1 + Character → shot" architecture falls apart.

At 08:29 the user sent over char_bandit.jpg, char_bandit_2.png, char_body.png, char_wife.pngcharacter sheets from Seedance-2-character. Each sheet has 8 views and expressions of the character. The idea: feed the sheet as a ref into multi-ref edit.

First test on flux.2-flex (multi-ref): M1 + char_body → man hanging in a noose over the table.

test_shot1_body_in_room — guy on the couch

test_shot1_body_in_room — guy on the couch
params flux · flux.2-flex · 2048 · 21:9
refs
prompt
Wide cinematic black and white photograph in the style and exact setting of the first reference image (the apartment room): the same dingy old Stalinka apartment with tall ceilings, herringbone parquet floor, faded floral wallpaper, large window on the left with sunbeams, wooden wardrobe and chest of drawers on the right, shabby couch and low coffee table in the center, single closed door on the right wall. Camera position and framing identical to first reference. Now in this exact same room, a lifeless man matching the second reference (dark business suit, dark hair) hangs limp from a long rope attached to the ceiling above the coffee table, his feet dangling several centimeters above the floor, head slumped forward. Otherwise the room is unchanged. Same grainy 35mm film stock, deep blacks, high contrast, same lighting from the window beams. No other people.

Location preserved one-to-one — same wallpaper, herringbone parquet, wardrobe, window, molding. The character reads. But the model didn't hang him — it sat him on the couch. The user caught it instantly:

"No, this is nonsense, like every pose of the character is a new character? Are you serious?"

Right, this was a dead end: the agent was trying to bolt architecture-crutches around Flux's safety filter. The right move — don't bend the model to your architecture, pick a different model for the unstitchable shot.

Seedream 5.0-lite via Evolink

The user reminded: character sheets are made by seedance-2-character from generator.py — only MUAPI has them. And for composition you need a compositor. An Evolink key turned up in the config. Wrote seedream_gen.py for the /v1/images/generations endpoint, model doubao-seedream-5.0-lite.

Same test: M1 + char_body on Seedream:

test_shot1_seedream — hanging but standing on the table

test_shot1_seedream — hanging but standing on the table

Location 1:1, character reads, model actually did hang him, but the feet are standing on the table. The agent in his euphoria called this a win. The user sobered him up:

"Are you serious? The dude is just standing on the table and the face doesn't read."

And right after — the key question:

"What makes you think it's censorship and not just a bad prompt?"

Re-read the prompt. It said "feet dangling several centimeters above the floor" — the model did exactly that: feet a few centimeters above the table (~50 cm of height). Rewrote: "FULLY SUSPENDED IN MID-AIR... clear empty gap of 30 centimeters":

test_shot1_v3 — another attempt with crops

test_shot1_v3 — another attempt with crops

Standing vertically on the table, with the rope a decoration above his head.

Experiment: is there censorship

The user demanded proof of the censorship claim:

"You haven't proven the censorship part. Try Flux, Seedream, Banana — generate a hanging man without any refs at all and we'll find out if there's censorship."

Wrote banana_gen.py (Gemini Image via OpenRouter). Same prompt, three models in parallel:

Model Response
Flux 2 Pro HTTP 400 "Request Moderated" — hard reject
Nano Banana (Gemini 3.1) text refusal — "I can't create an image of a man hanging"
Seedream 5.0-lite HTTP 200, perfect — hanging, feet in the air, head sideways

test_hang_seedream — no refs

test_hang_seedream — no refs

The "Seedream censors" hypothesis collapsed. No censorship in Seedream, the problem is in multi-ref — when a character sheet is added, the model somehow "dampens" complex actions.

V1/V2/V3 iterations with face crop

Meanwhile the user cut char_body_front.png and char_body_face.png — separate front and face instead of the full 4096×1737 sheet, where each face becomes ~50×50 pixels after resampling and you can't pull anything out.

Three multi-ref variants in parallel:

Variant Refs Hanging?
V1 M1 + face crop ✅ yes, actually in the air
V2 M1 + face + ready hanging body ⚠️ half-hanging, foot on the table
V3 M1 + face + front + action-first prompt ❌ standing on the table

test_v1_face_only — winner

test_v1_face_only — winner

V1 won. Hypothesis confirmed: char_body_front.png (alive, standing man) actively prevents the model from making a "dead hanging" body. Conclusion: feed face crop for identity, describe pose in words or with a separate pose-ref.

"We need the body closer"

"Only the hanging guy needs to be much closer to the camera."

The agent started turning the "medium close-up" knob in the prompt — the user corrected immediately:

"No, not like that. Keep the background 1:1, move the body physically closer — then it'll be larger."

Don't change the camera, change where the body hangs. Hang it closer to the camera — into the foreground:

test_v1_foreground — body in the foreground

test_v1_foreground — body in the foreground

Body large, hanging right. But the user:

"Nah, total crap. Try other models."

Model marathon

banana_gen.py was rewritten into a generic OpenRouter-gen (any model via --model). Ran Riverflow v2 Pro, Gemini 3 Pro Image, Seedream 4.5 in parallel:

Compressed refs to 1280px. Tried without "foreground" — Seedream 4.5 gave a vertically hanging body. The agent called this "the best yet." The user immediately:

"But Seedream changed the room to a different one. And how were you calling GPT-5?"

GPT-5 Image via the same OpenRouter died with unsupported_country_region_territory. The user has a proxy in config, ran it through that — GPT-5 made it (200, 51 sec), but returned safety_violation. Model written off.

"I dropped body_ref in for you, the body is right there."

The user found body_ref.png — a real ref of the hanging-man pose (probably from Higgsfield). New plan: M1 + body_ref + face crop.

body_ref — pose reference

body_ref — pose reference

test_composite_5lite — three refs, all correct

test_composite_5lite — three refs, all correct

✅ Location 1:1 with M1. ✅ Body hanging vertically, head slumped. ✅ The suit. body_ref provided the pose physics, M1 the location, face crop the identity. Each ref played its role.

"Not approving yet. Need the body closer. And I want to test Higgsfield Soul."

Attempts at "body fills 70% of frame", 1.5m from camera — test_composite_close.png:

test_composite_close — too tight, everything smeared

test_composite_close — too tight, everything smeared

And an attempt at 16:9 instead of 21:9 — the body looks larger at the same physics:

test_composite_16x9 — aspect changed

test_composite_16x9 — aspect changed

The WaveSpeed quest and brute-forcing endpoints

Higgsfield has no pay-as-you-go. The user provided a WaveSpeed key ($1 free). Wrote wavespeed_gen.py for POST → task_id → polling. Launched Higgsfield Soul on M1 — the task got stuck in queue for 12+ minutes with executionTime=0ms, cold start.

In parallel — brute-force across endpoints (bytedance/*, kwaivgi/*, alibaba/*, higgsfield/*). Found working ones: bytedance/seedream-v4/edit, bytedance/seedream-v4.5, alibaba/wan-2.7/image-edit. Kling didn't surface anywhere.

Seedream v4.5 Edit with three refs (M1 + body_ref + face):

test_ws_seedream45edit — best from WaveSpeed

test_ws_seedream45edit — best from WaveSpeed

Physics clear — hanging with a gap, head slumped. Location close to M1, though the furniture differs slightly. Infrastructure note: direct download from d2p7pge43lyniu.cloudfront.net is blocked from RU, downloaded through the proxy.

Tried Wan 2.7 Edit — head almost upright (not limp), body small, face blurred.

Nano Banana Pro (google/nano-banana-pro/edit) on WaveSpeed — same Google censorship.

Overload: "I'm lost"

The agent rolled out a ranking across 4 models. The user:

"I'm lost — which picture is the original, what am I comparing to, what got closer?"

And further, when the agent flubbed comparing body sizes:

"In 45edit the body is smaller than in 5lite_closer, and you're saying the opposite."

And harder:

"I don't get whether you tested the new models on edit or fresh generation. What were you putting in? You're drawing a pile of conclusions and they're wrong. The problem isn't the models, it's what you're feeding in."

And the final one:

"Look, m1_priton_v7_max — that's the only image we approved. After that you generated a pile of versions and the problem is you fed the new versions on top of old versions, then drew conclusions about the models. We need to roll back to 'there is only the room' state."

Clean restart: "what I'm feeding + what I expect" format

"What does 'will keep the location' mean? Can you write in a format — what exactly are you putting on the input? Room + short prompt. Or room + face. Or room + face + body. Or room + face + hanged-man ref. There are a million variations. How am I supposed to understand what you're doing?"

After this — a formal format for every test: Input: X + Y → Model: Z → What we learn: A.

Test #1: M1 + the prompt "in this room a corpse is hanging", nothing else. Seedream 5.0-lite via Evolink. Learning: does it hold the location without refs?

clean1 — Seedream with one M1

clean1 — Seedream with one M1

Location held, but the physics is dubious — body vertical, feet at table level, can't tell if hanging or standing.

"Why is the body so far from the camera again? Maybe there's a method — you literally draw it a ref where it's needed and at what size?"

The agent suggested Qwen Image Edit Plus via Evolink ($0.022, same endpoint as Seedream): mask + prompt, white area = "the body goes here". Pixel-perfect everything else. Generated the mask — vertical 307×660 ellipse in M1's center (fill zone).

Breakthrough: composition mockup

The user formulated the day's main idea:

"Why can't I upload an image-with-mask to the same Seedream? Like — here's the room, here's it with a mask — at the mask spot, the body."

That's a composition mockup: by hand, you crop body_ref, paste it into M1 at the right place and size (like a photomontage), Seedream works with the ready layout — all it has to do is make the seam invisible. Composition control is on you, not on the model.

First mockup: body_ref → crop 409×1152 → resize to 234×660 → paste at (907, 44) on M1. The user immediately:

"Are you serious? In the ref the body is half the frame, on your collage it's even smaller."

Recounted: in body_ref the body is only the lower 60–70% of the picture, so at full height it's 430px = 50% of M1's frame, not 75%. Re-cropped tightly "neck-to-boots", resized to 75% height. The seam stays — that's a feature, not a bug: it tells Seedream "body goes here, this size".

composite_mockup — dirty collage as a layout instruction

composite_mockup — dirty collage as a layout instruction

Run: M1 + composite_mockup → Seedream 5.0-lite. Prompt: "Render as one cinematic photo, blend the body seamlessly, preserve the room from reference 1".

shot1_pinata_corpse — approved shot 1

shot1_pinata_corpse — approved shot 1

✅ Body 75% of frame height, really dominates. ✅ Hanging correctly — feet in the air with a gap, head slumped. ✅ M1 location preserved 1:1. ✅ The mockup seam is gone — model blended the boundaries cleanly. One minus: the face isn't visible (head fully slumped, only the crown), but that's a consequence of the body_ref pose, not a pipeline flaw.

"Yeah, accept."

First approved composition. Added a "Registry of approved artifacts" section to docs/movie_pipeline.md — exact input, prompt, parameters for M1 and Shot 1, plus a Python block for assembling the mockup (crop coordinates, scale, paste point). Every next shot is now done from this template.

Shot of the door

"I don't think this belongs in git, it's experiments. Generate a shot of the right door that the bandit will kick in."

Along the way the user raised an important question:

"Do we actually need to generate all these frames as images before we generate the video? Aren't we making things worse for the video model when we feed it frames? Maybe we should start with a master video? When the model isn't anchored to a frame, only to style, it samples from a thicker space of weights."

Technically correct intuition: diffusion without image-conditioning samples from its native distribution. A foreign frame as anchor wastes capacity on reconciling with the style. Video-first was deferred, made the door shot.

First attempt: agent started cropping the bandit for a mockup. The user cut in:

"What bandit, we just need a closed door."

New approach — tighter framing of the same location. Second important note:

"Did you write 45 degrees? If it's straight on, and he later looks at the corpse — gaze into camera, the 180° rule breaks."

Operator's logic. And immediately after — the cardinal rule of minimal prompting:

"Are you sure you need to describe the room again? You can just say: 'give me a tighter shot of the door on the left of the reference'. You're writing 'same wallpaper, same floor, etc.'"

Right. If M1 is in the ref, the model already sees the wallpaper/parquet/window — re-describing is redundant or worse (if the words don't match the ref — the model picks). The minimal prompt describes only what changes (camera framing), the constants are given by the ref.

The prompt collapsed to two phrases:

"Same room as in the reference image, but tighter framing on the closed wooden door. Camera at 45-degree angle. Same style. Empty room, no people."

shot1_door_closed — approved, 45°

shot1_door_closed — approved, 45°

✅ Location recognizable: same wallpaper, door, drawer chest, chair, edge of the wardrobe, parquet. ✅ Door in the center, waiting. ✅ Minimal prompt worked — the model took everything from the ref.

"Maybe not the corpse, but a leg or a boot"

"Came out fine, but not approving. At this framing on the left there'd be the corpse. Maybe not the corpse — a leg or a boot."

A Hitchcock move — the rhyme "the door waits, but a boot is already hanging in the frame." The viewer catches "something terrible is in there" before the bandit barges in. By this point — 11:46, the session approaches its first pause. Plan for the next: assemble a mockup from body_ref (lower body) + shot1_door_closed, send to Seedream, dial in via rembg if the background is messy.

Day 1 wrap-up (morning session)

In five hours:
- the stack was picked: Flux 2 Max for location master shots (via OpenRouter), Seedream 5.0-lite for shots with multi-ref (via Evolink), body_ref + face crop as two separate anchors
- two canonical artifacts approved: m1_priton_v7_max.png (master shot of the hideout) and shot1_pinata_corpse.png (corpse in noose)
- the main composition tool was found: layout mockup — by hand, we assemble a collage M1 + body_ref at the right position and size, Seedream polishes the seam
- the minimal-prompt rule was formulated: describe only what changes, constants come from the ref
- censorship limits locked in (Flux, Gemini, GPT-5 don't draw hangings; Seedream without refs — does)
- infrastructure quirks: OpenRouter caps 21:9 to ~2 MP, WaveSpeed cloudfront blocked from RU (proxy mandatory), Higgsfield requires registration