logoalt Hacker News

anonymous908213yesterday at 7:48 PM1 replyview on HN

The frontier LLMs have the same overall architecture as earlier models. I absolutely understand how they operate. I have worked in a startup wherein we heavily finetuned Deepseek, among other smaller models, running on our own hardware. Both Deepseek's 671b model and a Mistral 7b model operate according to the exact same principles. There is no magic in the process, and there is zero reason to believe that Sonnet or Opus is on some impossible-to-understand architecture that is fundamentally alien to every other LLM's.


Replies

pfischtoday at 12:10 AM

Deepseek and Mistral are both considerably behind Opus, and you could not make deepseek or mistral if I gave you a big gpu cluster. You have the weights but you have no idea how they work and you couldn't recreate them.

> I have worked in a startup wherein we heavily finetuned Deepseek, among other smaller models, running on our own hardware.

Are you serious with this? I could go make a lora in a few hours with a gui if I wanted to. That doesn't make me qualified to talk about top secret frontier ai model architecture.

Now you have moved on to the guy who painted his honda, swapped out some new rims, and put some lights under it. That person is not an automotive engineer.