logoalt Hacker News

embedding-shapetoday at 12:02 PM0 repliesview on HN

If you have a model that only know how to model CAD but also doesn't know history, and was trained on visual language of said history, how is it supposed to be able to model the Pantheon in the first place? It'd only be able to model exactly what you can describe with text, or even worse, exactly what it'd be able to visually extract from images via the vision encoders, for "vision models", but it'd be a far cry from what you see in this blogpost, would be my guess.