logoalt Hacker News

joshribakofflast Saturday at 4:15 PM3 repliesview on HN

I’ve been doing game development and it starts to hallucinate more rapidly when it doesn’t understand things like the direction it placing things or which way the camera is oriented

Gemini models are a little bit better about spatial reasoning, but we’re still not there yet because these models were not designed to do spatial reasoning they were designed to process text

In my development, I also use the ascii matrix technique.


Replies

kleene_oplast Saturday at 4:35 PM

Spatial awareness was also a huge limitation to Claude playing pokemon.

It really seems to me that the first AI company getting to implement "spatial awareness" vector tokens and integrating them neatly with the other conventional text, image and sound tokens will be reaping huge rewards. Some are already partnering with robot companies, it's only a matter of time before one of those gets there.

show 1 reply
hypercube33last Saturday at 5:01 PM

I disagree. With opus I'll screenshot an app and draw all over it like a child with me paint and paste it into the chat - it seems to reasonably understand what I'm asking with my chicken scratch and dimensions.

As far as 3d I don't have experience however it could be quite awful at that

miohtamalast Saturday at 4:34 PM

They would need a spatial reason or layout specific tool, to translate to English and back

show 1 reply