Oh this is super cool! Funny enough that just a few minutes ago I gave GPT-5.5 in the Copilot CLI an image of the floor plan of my balcony and asked it to come up with a zig-zag pattern for a 14m string of lights I'm considering buying to visualize if 14 meters would be enough.
My first prompt was a Preview-annotated image where I drew the bounding boxes of where the lights would go with a green line and marked the power socket with a red dot.
I just tried the exact same prompt but using tack instead of marking with 'pen' on the image. It completed it with much better results, one-shot (instead of having to steer 3 times) and in 1/20th of the original time.
Token usage for the original approach: `Tokens ↑ 2.4m • ↓ 50.2k • 2.2m (cached) • 10.1k (reasoning)`
Token usage using coordinates instead of drawing: `Tokens ↑ 79.2k • ↓ 4.7k • 72.7k (cached) • 1.5k (reasoning)`
Not very surprising that a JSON of coordinates is more efficient than drawing a crude line on an image, but I couldn't be bothered because GitHub still charges per request instead of per token. If I'd had this 20 minutes earlier I WOULD have bothered, though. Nice work :)