logoalt Hacker News

zbentleytoday at 4:35 PM0 repliesview on HN

The key part of the article is that token structure interpretation is a training time concern, not just an input/output processing concern (which still leads to plenty of inconsistency and fragmentation on its own!). That means both that training stakeholders at model development shops need to be pretty incorporated into the tool/syntax development process, which leads to friction and slowdowns. It also means that any current improvements/standardizations in the way we do structured LLM I/O will necessarily be adopted on the training side after a months/years lag, given the time it takes to do new-model dev and training.

That makes for a pretty thorny mess ... and that's before we get into disincentives for standardization (standardization risks big AI labs' moat/lockin).