logoalt Hacker News

giancarlostoroyesterday at 3:19 PM1 replyview on HN

I wonder if this is what a “Minimally Viable LLM” looks like. I often wonder how much of an LLM do you need before you can just shove a bigger context Window and any dynamic knowledge content to it like a PDF or markdown file to give it knowledge outside of its training data. I feel like LLMs don’t need more data they just need to be refined.


Replies

x3ccayesterday at 6:40 PM

You might be interested in this model. It's a densely trained on math whuch let's it punch way higher than it should https://github.com/WeiboAI/VibeThinker