You don't need a new GPU

This is a response to a real and honestly useful ComfyUI workflow posted on Reddit: Ideogram 4 img2img editing via inpaint using SAM2.

The author is not doing anything wrong. The post is good: masks by prompt, caption-assisted prompt composition, partial denoise, SAM2 segmentation, and working nodes shared openly. Respect. The point is not that the workflow is bad. The point is that the workflow reveals how much ceremony local power asks you to accept.

So consider this a benchmark of ceremony, not a dunk on the rig.

You saw it. A 3090, plus a 2080Ti, plus a 3060. SAM2 for the mask. A custom caption node. A second LLM whose entire job was to write the prompt for the first one. Two minutes a run on three cards - to change the text on a sign and swap the background.

Respect. It works. The author even shared the workflows. It is also a lot of house to heat.

Here is the boring counter-flex. Same class of job. No rig.

The whole chain, in one tab

> open hyperdraw on your phone
> t2i a base plate from one of close to two hundred models
> recolor it so the palette stops fighting you
> i2i to push the look without nuking the composition
> lift the mech out as an actual mesh with SAM 3D
> spin it in the 3D view, drop the angle you want back onto the canvas as a layer
> retexture from a prompt - Describe writes it, the rewrite pass cleans it up
> flatten when you like it

Every step there is a real model or filter that already ships. The heavy ones run server-side, which is the whole reason this works on a phone on the train. No CUDA. No pip install roulette. No “which Torch breaks today.” The compute is somebody else’s problem; you get the canvas.

”But SDXL renders one frame faster”

Yes. On a local card, a single render is faster, and that was never the claim. The claim is that you can run the whole chain - generate, recolor, img2img, lift to 3D, snap an angle back, retexture - in one place. On hardware you already own. Without it being a weekend of node wiring.

One render is not the job. The job is the ten decisions around it. The rig is great at the render and makes you build a factory for the decisions. This is the version where you skip the factory and keep the decisions.

Power without the ceremony

People assume “runs on a phone” means “toy.” It means the ceremony moved off your machine, not that the capability shrank. SAM 3D extraction, Retexture with material presets, img2img, selection-clipped recolor with gradients, Levels, HSL, Color Balance, or Vibrance, and t2i across close to two hundred models - that is not a beginner sandbox. It is the same kind of work, minus the CUDA tax and the GPU invoice.

ComfyUI is great at what it is great at. This is just the version where you did not have to assemble the factory to find out the subject reads better at three-quarter view.

If you want the slower, more deliberate “decide first, then render in your graph” version of this argument, that is the bridge workflow.


You don’t need a new GPU. You needed the angle. Go find it at hyperdraw.art - Dream, recolor, SAM 3D, Retexture, flatten. Same chain, same phone, same card you already own.