LLMs can do chat-completion, they don't do only chat completion. There are LLMs for image gener...

notepad0x90 • yesterday at 8:35 PM • 1 reply • view on HN

LLMs can do chat-completion, they don't do only chat completion. There are LLMs for image generation, voice generation, video generation and possibly more. The camera of a drone inputs images for the LLM, then it determines what action take based on that. Similar to if you asked ChatGPT "there is a tree in this picture, if you were operating a drone, what action would you take to avoid collision", except the "there is a tree" part is done by the LLMs image recognition, and the sys prompt is "recognize objects and avoid collision", of course I'm simplifying it a lot but it is essentially generating navigational directions under a visual context using image recognition.

Replies

nrrbtrbbrb • today at 4:02 AM

> There are LLMs for image generation,

That part isn’t handled by an LLM

> voice generation,

That part isn’t handled by an LLM

> video generation

That part isn’t handled by an LLM

➕ show 1 reply

alt Hacker News

Replies