logoalt Hacker News

gf000yesterday at 7:29 AM2 repliesview on HN

What calculations? Do you mean "3+5" or a generic Turing-machine like model?

In either case, this "it's a language model" is a pretty dumb argument to make. You may want to reason about the fundamental architecture, but even that quickly breaks down. A sufficiently large neural network can execute many kinds of calculations. In "one shot" mode it can't be Turing complete, but in a weird technicality neither does your computer have an infinite tape. It just simply doesn't matter from a practical perspective, unless you actually go "out of bounds" during execution.

50T parameters give plenty of state space to do all kinds of calculations, and you really can't reason about it in a simplistic way like "this is just a DFA".

Let alone when you run it in a loop.


Replies

razorbeamzyesterday at 8:11 AM

> What calculations? Do you mean "3+5" or a generic Turing-machine like model?

Either one. An LLM cannot solve 3+5 by adding 3 and 5. It can only "solve" 3+5 by knowing that within its training data, many people have written that 3+5=8, so it will produce 8 as an answer.

An LLM, similarly, cannot simulate a Turing machine. It can produce a text output that resembles a Turing machine based on others' descriptions of one, but it is not actually reading and writing bits to and from a tape.

This is why LLMs still struggle at telling you how many r's are in the word "strawberry". They can't count. They can't do calculations. They can only reproduce text based on having examined the human corpus's mathematical examples.

show 1 reply
gpderettayesterday at 10:34 AM

> In "one shot" mode it can't be Turing complete, but in a weird technicality neither does your computer have an infinite tape

Nor our brains, in fact.