alt
Hacker News
bonoboTP
•
today at 9:05 AM
•
0 replies
•
view on HN
Vision and audio is already in use in multimodal LLMs. So it's possible in the past.