logoalt Hacker News

bonoboTPtoday at 9:05 AM0 repliesview on HN

Vision and audio is already in use in multimodal LLMs. So it's possible in the past.