logoalt Hacker News

LeftHandPath07/31/20255 repliesview on HN

There are some things that you still can't do with LLMs. For example, if you tried to learn chess by having the LLM play against you, you'd quickly find that it isn't able to track a series of moves for very long (usually 5-10 turns; the longest I've seen it last was 18) before it starts making illegal choices. It also generally accepts invalid moves from your side, so you'll never be corrected if you're wrong about how to use a certain piece.

Because it can't actually model these complex problems, it really requires awareness from the user regarding what questions should and shouldn't be asked. An LLM can probably tell you how a knight moves, or how to respond to the London System. It probably can't play a full game of chess with you, and will virtually never be able to advise you on the best move given the state of the board. It probably can give you information about big companies that are well-covered in its training data. It probably can't give you good information about most sub-$1b public companies. But, if you ask, it will give a confident answer.

They're a minefield for most people and use cases, because people aren't aware of how wrong they can be, and the errors take effort and knowledge to notice. It's like walking on a glacier and hoping your next step doesn't plunge through the snow and into a deep, hidden crevasse.


Replies

og_kalu07/31/2025

LLMs playing chess isn't a big deal. You can train a model on chess games and it will play at a decent ELO and very rarely make illegal moves(i.e 99.8% legal move rate). There are a few such models around. I think post training messes with chess ability and Open ai et al just don't really care about that. But LLMs can play chess just fine.

[0] https://arxiv.org/pdf/2403.15498v2

[1] https://github.com/adamkarvonen/chess_gpt_eval

show 1 reply
smiley143707/31/2025

> people aren't aware of how wrong they can be, and the errors take effort and knowledge to notice.

I have friends who are highly educated professionals (PhDs, MDs) who just assume that AI\LLMs make no mistakes.

They were shocked that it's possible for hallucinations to occur. I wonder if there's a halo effect where the perfect grammar, structure, and confidence of LLM output causes some users to assume expertise?

show 8 replies
physicsguy07/31/2025

It's super obvious even if you try and use something like agent mode for coding, it starts off well but drifts off more and more. I've even had it try and do totally irrelevant things like indent some code using various Claude models.

show 1 reply
DougBTX07/31/2025

Yeah, the chess example is interesting. The best specialised AIs for chess are all clearly better than humans, but our best general AIs are barely able to play legal moves. The ceiling for AI is clearly much higher than current LLMs.

show 1 reply
nomel07/31/2025

> you'd quickly find that it isn't able to track a series of moves for very long (usually 5-10 turns; the longest I've seen it last was 18)

In chess, previous moves are irrelevant, and LLM aren't good with filtering out irrelevant data [1]. For better performance, you should include only the relevant data in the context window: the current state of then board.

[1] https://news.ycombinator.com/item?id=44724238