logoalt Hacker News

wohoefyesterday at 7:26 AM3 repliesview on HN

In my experience LLMs have a hard time working with text grids like this. It seems to find columns harder to “detect” then rows. Probably because it’s input shows it as a giant row if that makes sense.

It has the same problem with playing chess. But I’m not sure if there is a datatype it could work with for this kinda game. Currently it seems more like LLMs can’t really work on spacial problems. But this should actually be something that can be fixed (pretty sure I saw an article about it on HN recently)


Replies

fi-leyesterday at 11:43 AM

Good point. The architectural solution that would come to mind is 2D text embeddings, i.e. we add 2 sines and cosines to each token embedding instead of 1. Apparently people have done it before: https://arxiv.org/abs/2409.19700v2

show 1 reply
froobiusyesterday at 8:44 AM

Transformers can easily be trained / designed to handle grids, it's just that off the shelf standard LLMs haven't been particularly, (although they would have seen some)

show 1 reply
stavrosyesterday at 9:23 AM

If this were a limitation in the architecture, they wouldn't be able to work with images, no?

show 1 reply