Interesting to see a pivot away from MoE by both IBM and mistral while the larger classes of SOTA of...

Havoc • today at 10:55 AM • 2 replies • view on HN

Interesting to see a pivot away from MoE by both IBM and mistral while the larger classes of SOTA of models all seem to be sticking to it.

Quick vibe check of it- 8B @ Q6 - seems promising. Bit of a clinical tone, but can see that being useful for data processing and similar. You don't really want a LLM that spams you with emojis sometimes...

Replies

embedding-shape • today at 11:51 AM

Makes sense, dense for small models, dense or MoE for larger ones, end up fitting various hardware setups pretty neatly, no need for MoE at smaller scale and dense too heavy at large scale.

npodbielski • today at 12:18 PM

I never want LLM to span me with emojis. What is the use case for that? I find it highly annoying.

➕ show 2 replies

alt Hacker News

Replies