logoalt Hacker News

aziis98yesterday at 10:01 PM1 replyview on HN

> Pointing capability: Gemini 3 has the ability to point at specific locations in images by outputting pixel-precise coordinates. Sequences of 2D points can be strung together to perform complex tasks, such as estimating human poses or reflecting trajectories over time

Does somebody know how to correctly prompt the model for these tasks or even better provide some docs? The pictures with the pretty markers are appreciated but that section is a bit vague and without references


Replies

atonseyesterday at 10:16 PM

For my CMS I’d love to get an AI to nicely frame a picture in certain aspect ratios. Like of I provide an image, give me coordinates for a widescreen, square, portrait, and 4x3 using a photographers eye.

Any model that can do that? I tried looking in huggingface but didn’t quite see anything.