logoalt Hacker News

Havoctoday at 10:55 AM2 repliesview on HN

Interesting to see a pivot away from MoE by both IBM and mistral while the larger classes of SOTA of models all seem to be sticking to it.

Quick vibe check of it- 8B @ Q6 - seems promising. Bit of a clinical tone, but can see that being useful for data processing and similar. You don't really want a LLM that spams you with emojis sometimes...


Replies

embedding-shapetoday at 11:51 AM

Makes sense, dense for small models, dense or MoE for larger ones, end up fitting various hardware setups pretty neatly, no need for MoE at smaller scale and dense too heavy at large scale.

npodbielskitoday at 12:18 PM

I never want LLM to span me with emojis. What is the use case for that? I find it highly annoying.

show 2 replies