Trinity large: An open 400B sparse MoE model

68 points • by linolevan • today at 12:57 AM • 22 comments • view on HN

Comments

They trained it in 33 days for ~20m (that includes apparently not only the infrastructure but also the salaries over a 6 month period). And the model is coming close to QWEN and Deepseek. Pretty impressive

➕ show 1 reply

linolevan • today at 1:12 AM

I'm particularly excited to see a "true base" model to do research off of (https://huggingface.co/arcee-ai/Trinity-Large-TrueBase).

mwcampbell • today at 9:54 PM

Given that it's a 400B-parameter model, but it's a sparse MoE model with 13B active parameters per token, would it run well on an NVIDIA DGX Spark with 128 GB of unified RAM, or do you practically need to hold the full model in RAM even with sparse MoE?

➕ show 2 replies

Alifatisk • today at 10:59 PM

What did they do to make the loss drop so much in phase 3?

Also, why are they comparing with Llama 4 Maverick? Wasn’t it a flop?

➕ show 2 replies

syntaxing • today at 11:14 PM

So refreshing to see open source models like this come from the US. I would love for a 100Bish size one that can compete against OSS-120B and GLM air 4.5

frogperson • today at 9:51 PM

What exactly does "open" mean in this case? Is it weights and data or just weights?

➕ show 1 reply

greggh • today at 9:54 PM

The only thing I question is the use of Maverick in their comparison charts. That's like comparing a pile of rocks to an LLM.

0xdeadbeefbabe • today at 11:32 PM

Is anyone excited to do ablative testing on it?

observationist • today at 5:15 PM

This is a wonderful release.

alt Hacker News

Trinity large: An open 400B sparse MoE model

Comments