logoalt Hacker News

wolttamtoday at 2:09 AM1 replyview on HN

The point is not to be as good as the multi-trillion parameter model you can host in across 72 GPUs (or whatever).

I'm running a 248B model on a paltry amount of hardware and getting plenty of good use out of it.

Sure, the most demanding tasks will demand the best models (and always will). There's still less demanding tasks for other models.

I think some people are fooling themselves that coding of all tasks is always going to requires the biggest models ever. Again, maybe some coding tasks will, but the majority of business CRUD apps probably don't. Same goes for virtually any other type of task. The biggest models are really only useful for the most complex tasks.


Replies

sgctoday at 3:18 AM

If you wouldn't mind, could you explain a bit what the 248B model is good for, and where it breaks down and you need something better? I hear this take often, but it is always a fleeting remark so I have no idea what the 'useful' looks like - at all.

show 3 replies