I disagree with gpt-image-1.5's grade on the worm sign. It moved some of the marks around to accommodate the enlarged black area, but retained the overall appearance of the sign.
I can see how you'd come to that conclusion. Each prompt is supposed to illustrate a different type of test criteria. The ultimate goal of Worm Sign is intended to test a near 100% retention of the original weathered/dented sign.
If you look at the ones that passed (Flux.2 Pro, Gemini 2.5 Flash, Reve), you'll see that they did not add/subtract/move any of the pockmarks from the original image.
I can see how you'd come to that conclusion. Each prompt is supposed to illustrate a different type of test criteria. The ultimate goal of Worm Sign is intended to test a near 100% retention of the original weathered/dented sign.
If you look at the ones that passed (Flux.2 Pro, Gemini 2.5 Flash, Reve), you'll see that they did not add/subtract/move any of the pockmarks from the original image.