LLMs are trained on text. Why would we expect them to understand a visual and tactile 3D world?

SoftTalker • yesterday at 6:51 PM • 1 reply • view on HN

azinman2 • yesterday at 6:53 PM

Because they’re also multimodal vLLMs.

alt Hacker News