logoalt Hacker News

westurneryesterday at 5:15 PM0 repliesview on HN

It makes a lot of sense for math-focused LLMs to work with higher order symbols - or context-dependent chunking - than tokens. The same is probably true for software.

From "Large Language Models for Mathematicians (2023)" (2025) https://news.ycombinator.com/item?id=42899805 :

> It makes sense for LLMs to work with testable code for symbolic mathematics; CAS Computer Algebra System code instead of LaTeX which only roughly corresponds.

> Are LLMs training on the AST parses of the symbolic expressions, or token coocurrence? What about training on the relations between code and tests?

There are already token occurrence relations between test functions and the functions under test that they call. What additional information would it be useful to parse and extract and graph rewrite onto source code before training, looking up embeddings, and agent reasoning?