In my experience LLMs have a hard time working with text grids like this. It seems to find columns h...

wohoef • yesterday at 7:26 AM • 3 replies • view on HN

In my experience LLMs have a hard time working with text grids like this. It seems to find columns harder to “detect” then rows. Probably because it’s input shows it as a giant row if that makes sense.

It has the same problem with playing chess. But I’m not sure if there is a datatype it could work with for this kinda game. Currently it seems more like LLMs can’t really work on spacial problems. But this should actually be something that can be fixed (pretty sure I saw an article about it on HN recently)

Replies

fi-le • yesterday at 11:43 AM

Good point. The architectural solution that would come to mind is 2D text embeddings, i.e. we add 2 sines and cosines to each token embedding instead of 1. Apparently people have done it before: https://arxiv.org/abs/2409.19700v2

➕ show 1 reply

froobius • yesterday at 8:44 AM

Transformers can easily be trained / designed to handle grids, it's just that off the shelf standard LLMs haven't been particularly, (although they would have seen some)

➕ show 1 reply

stavros • yesterday at 9:23 AM

If this were a limitation in the architecture, they wouldn't be able to work with images, no?

➕ show 1 reply

alt Hacker News

Replies