It's a tough test for local models - (gpt-image and NB had zero problems) - the only one that came reasonably close was Qwen-Image
Z-Image / Flux 2 / Hidream / Omnigen2 / Qwen Samples:
This is where smaller models are just going to be more constrained and will require additional prompting to coax out the physical description of a "pogo stick". I had similar issues when generating Alexander the Great leading a charge on a hippity-hop / space hopper.