For those not trying, this allows Deepseek to understand a picture (instead of just extracting text from it), and it can describe what's in the picture, but this is not an image generation system, so you can't ask it to modify an image.
Personally, I'm a bit surprised the DS chat app still doesn't offer its own text to speech and speech to text features (I know DS doesn't have any ASR model for example, but there are quite a few in the open).
Can you explain what the benefits are of actually "talking" with the bot instead of typing and reading?
As someone who would rather send a slack message to a coworker rather than actually walking over and talk to them, the idea of having to talk with my laptop is not appealing at all, haha.