logoalt Hacker News

ValdikSStoday at 2:27 AM2 repliesview on HN

That's why LLM will eventually be used only for initial interaction between the user in their language, to prepare the data to a specialized model.

Imagine face recognition to work like a text chat, where the PC gets the frame from the camera and writes in the chat: "Who's that? Here's the RGB888 image in hex: ...".


Replies

stingraycharlestoday at 6:15 AM

Do you know that MoE is a thing?

show 1 reply
FeepingCreaturetoday at 6:09 AM

That's actually how vision language models already work, pretty much.

show 1 reply