logoalt Hacker News

ilakshtoday at 6:06 PM5 repliesview on HN

My instinct (for better or worse) is usually contrarian. Most people seem very skeptical of what Meta is doing with AI. But, what if, in a way at least, it makes sense?

Maybe Wang has correctly identified that the programming and agentic ability that Anthropic and OpenAI models have has largely come from armies of software engineers creating massive datasets by writing out coding and agentic problems and solutions?

So he told Zuckerberg that. The reason it may be turning into so much friction is that at companies like Anthropic or OpenAI, training engineers were either hired specifically for that purpose or probably mostly handled through contracts with third parties (which again, hired them to train AI). And honestly many of them may be overseas or just happy to have a job in a difficult period. But anyway they wouldn't have very high salary expectations etc.

But Zuckerberg already had 25000 engineers. Why not take say 1/5 of them and get them working on the the dataset? The problem is that those engineers were hired for different prestigious highly paid positions at Meta/Facebook. They were not hired to do tedious grading of AI answers or quiz construction.

But Zuckerberg either has to do this, or spend additional billions on doing it all with external contractors. A third option would be to try to create a massive distillation operation. Or just hope that his engineers could invent some magical new training trick that manifested the agentic and programming skills without the large scale human input.

Or he could release a model trained largely by existing open weights models. Which without some huge breakthrough probably has no chance of surpassing them, so is pointless.

I think most of the substantive criticism of Zuckerberg has been about burning funds. If he gives up the "your job is to grade AI homework now" plan because his engineers refuse, he would need to go through third parties. The additional billions and billions this would cost would create more pressure on the bottom line and shareholder pressure.

It would also give up any potential advantage that Wang may have optimistically sold the operation as, on that using "real" engineers as opposed to lower paid data labelling engineers might result in a higher quality dataset.

At some point, model architectures that don't need such massive datasets or can be created automatically in a way that advances the frontier will probably come about. But right now it doesn't exist.

Further, the way AI works currently, business advantage from AI comes from encoding existing internal intelligence and knowledge. Meta's massive engineering corp effectively has that in their heads. Having them create these datasets is possibly the only way to leverage this knowledge asset in this paradigm.

I guess the problem is it means forcing thousands of people to do a different job from the one they were hired for.


Replies

TheOtherHobbestoday at 6:16 PM

None of that makes sense.

What's the end goal? Meta-specific engineering, with baked-in knowledge of how FB, Threads, and WhatsApp work? General and/or coding products to compete with Anthropic and OpenAI? Some special Magic Thing which only Meta can invent which will bedazzle Meta's users?

You don't need giant datasets unless you know what you're going to do with them. OpAI and Anthropic are having enough issues making their products profitable. And those are, if not beloved, then at least respected, with a real, if patchy, reputation for usefulness.

What was Meta's pitch in this market? There were hints of interest when LeCun was still doing original R&D, and there was some distant possibility of a next-gen revolutionary product.

But now the goal seems to be to flail around doing something incoherently AI-branded with no obvious strategy.

The troops are being marched around, but no one knows where the battle is supposed to be.

show 2 replies
PaulHouletoday at 6:31 PM

One problem is that the AI agent market is fiercely competitive. Why build when you can buy? For the foreseeable future there will be a number of competitive models on the "efficient frontier" and I don't think one vendor will pull ahead.

In that market you can build a model and spend a lot of money on it and at best get something that's on the same frontier as everybody else but just as likely end up with uncompetitive models like the ones they have now.

You might save a bit running your own models, doing your own inference, etc. Why not take advantage of "last mover advantage" and buy whatever is best when you need it and figure the odds are good that everybody else is going to buy more GPUs than they need and as a large customer you'll be able to buy in bulk at fire sale prices?

show 1 reply
ungovernableCattoday at 6:33 PM

>I think most of the substantive criticism of Zuckerberg has been about burning funds.

I'm not in the org myself I know some Meta SWEs tangentially. My understanding is that the biggest criticism is just the chaos of it all. Jumping constantly from one thing to another like headless chickens and accomplishing nothing.

It created an environment where it's kind of impossible to plan and progress your career.

Syzygiestoday at 7:03 PM

> I think most of the substantive criticism of Zuckerberg has been about burning funds.

The 2017 Rohingya massacre in Myanmar? They handed him the death toll. He filed it under growth.

winstonptoday at 6:15 PM

While I mostly agree with your post, I do want to point out one thing:

> Or he could release a model trained largely by existing open weights models. Which without some huge breakthrough probably has no chance of surpassing them, so is pointless.

This seems to be categorically untrue. Composer 2.5 is a substantial improvement on its underlying Kimi base model.

show 1 reply