logoalt Hacker News

nubgyesterday at 9:33 PM1 replyview on HN

A question I always have is, how to the AI labs safeguard the leak of their model? Training a cutting edge model basically cost a minimum of hundreds of millions of dollars. And its all contained within a file. Okay, that file might be 500GB large, but its still just one blob that is worth almost a billion dollars. And they need to train new models every few weeks, have lots of people with access to it to debug it, run inference etc. I wonder when we will see the first leaks? Imagine if e.g. Opus 4.8 got leaked. Wouldnt that bankrupt Anthropic?


Replies

rvnxyesterday at 10:39 PM

Employees naturally jump from one company to another, and they know the secret sauce.

The difference is in the dataset mostly and to extract this dataset, competitors use a process called distillation (= extract data through actual queries) from the other models.

This yield to "funny" cases as well, like Gemini who claims "I am ChatGPT" occasionally, or ChatGPT calling itself Claude, etc.

https://note.com/maudi/n/n821a6308437b?hl=en

They all copy on each other.