logoalt Hacker News

taykolasinskiyesterday at 2:09 PM1 replyview on HN

OP here. I spent the last few days reproducing the mHC architecture from the recent DeepSeek paper (2512.24880).

Two key takeaways from the reproduction:

Unconstrained Hyper-Connections really do explode (7x amplification even at 10M scale).

I hit a nasty "stream persistence" bug where my tensors were the right shape, but the architecture was functionally broken.

This is Part 1 (10M scale). Part 2 (scaling to 1B on A100s) is coming later this week. Happy to answer questions about the implementation.


Replies

WiSaGaNyesterday at 2:48 PM

How do you know "GPT-5, Claude, Llama, Gemini. Under the hood, they all do the same thing: x+F(x)."?

show 1 reply