logoalt Hacker News

100721today at 8:30 AM3 repliesview on HN

Does anyone know why they are using language models instead of a more purpose-built statistical model? My intuition is that a language model would either be overfit, or its training data would have a lot of noise unrelated to the application and significantly drive up costs.


Replies

LeoWattenbergtoday at 8:39 AM

It's not an LLM, it is a purpose built model. https://arxiv.org/html/2411.19506v1

5 years ago we would've called it a Machine Learning algorithm. 5 years before that, a Big Data algorithm.

show 2 replies
kevmo314today at 8:37 AM

This might be some journalistic confusion. If you go to the CERN documentation at https://twiki.cern.ch/twiki/bin/view/CMSPublic/AXOL1TL2025 it states

> The AXOL1TL V5 architecture comprises a VICReg-trained feature extractor stacked on top of a VAE.

dmdtoday at 9:29 AM

… they’re not? Who said they are? The article even explicitly says they’re not?

show 1 reply