logoalt Hacker News

herval12/10/20241 replyview on HN

> But, why would it get better to any significant extent?

Two years ago, the very best closed-source image model was unable to represent anything remotely realistic. Today, there's hundreds of open source models that can generate images that are literally indistinguishable from reality (like Flux). Not only that, there's an entire collection of tools and techniques around style transfer, facial reconstruction, pose control, etc. It's mindblowing, and every week there's a new paper making it even better. Some of that could have been more training data. Most of it wasn't.

I guess it's fair to extrapolate that same trend to video, since it's the arc text, audio and images have taken? No reason it would be different.


Replies

EternalFury12/10/2024

I get that. But, let’s say you have a glass, you fill it to one third, then to half, then to three quarter, then to full. Can you expect to fill it beyond full? Not every process has an infinite ramp.

It seems frontier labs have been throwing all the compute and all the data they could get their hands on at model training for at least the past 2 years. Is that glass a third full or is it nearly full already?

Is the process of filling that particular glass linear or does the top 20% of the glass require X times as much water to fill as the bottom 20%?

show 2 replies