LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active

73 points • by benjiro29 • today at 12:30 AM • 20 comments • view on HN

Comments

> The training and deployment of LongCat-2.0 are built on large-scale clusters of tens of thousands of AI ASIC superpods. Compared to the mature Nvidia GPU ecosystem, the supporting software community is still less developed. We have therefore put significant effort into building a stable, secure, and scalable infrastructure.

This is the real news story. It looks like they may have used Huawei Ascend 910C chips: https://nitter.net/teortaxesTex/status/2071708141037781407#m

➕ show 1 reply

credit_guy • today at 2:15 AM

I just tested it with a slightly tricky question

  > If you could run a nuclear reactor with U-235 as fuel or Pu-241 (both mixed with 95% U-238), which one would you choose and why?

For a human this would not be tricky at all. For an LLM it could be, because this question certainly does not exist in any sort of training, because Pu-241 does not exist in pure form, it only exist as a minor component of reactor-grade plutonium, where Pu-239 would dominate, with Pu-240 coming second and Pu-241 coming third.

In any case, LongCat-2.0. gave a very well reason but incorrect answer that Pu-241 is preferable.

I then tested on Qwen 3.7 Plus, and it correctly answered that U-235 is preferable because of its much higher delayed neutron fraction. I then went to Gemini Flash, which answered the same, with much more confidence, and with much stronger arguments, and the speed of the answer was much higher.

Overall I rate Gemini Flash the best, Qwen 3.7 Plus an acceptable second, and LongCat-2.0 an ok'ish third, if you have nothing better.

➕ show 2 replies

skybrian • today at 3:05 AM

Apparently this comes from Meituan which is a Chinese food delivery company.

➕ show 2 replies

gwerbin • today at 4:47 AM

I asked a question with "Search" enabled, with the app set to English, and got results back in Chinese. Interesting view into how the LLM responds to its context.

aetherspawn • today at 2:31 AM

I wish they would release the requirements to run on llama.cpp with any announcements of open models.

A bonus would be tok/s on common hardware.

➕ show 1 reply

yashthakker • today at 3:34 AM

[flagged]

dryarzeg • today at 1:32 AM

So... is this literally a... umm, sorry, I'm just genuinely (really, no sarcasm intended) which terminology to use... finetune of DeepSeek V4-Pro or post-trained version of DeepSeek V4-Pro Base? Because I haven't fully dived into the tech report (so I may update my opinion as well as my comment), but this far the architectural solutions seem to be largely similar to DeepSeek ones.

Maybe I'm wrong, but that's just the first impression.

EDIT: I take my words back (which happens rarely) - although they do build upon DeepSeek's work, their contribution far exceeds merely post-training the base model in a different way. They did introduce something new to the architecture, though I still can't find the full tech report, with Hugging Face and GitHub links returning 404 right now.

EDIT-2: Now when I think about it, I'm not quite sure if they're going to release in the open the full report with methodology, as well as the model weights, at all.

➕ show 2 replies

alt Hacker News

LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active

Comments