logoalt Hacker News

fsiefkenyesterday at 3:10 PM1 replyview on HN

I am curious how these models would perform and how much energy they'd take to semi-realtime detect objects: SmolVLM2-500M - Moondream 0.5B/2B/2.5B - Qwen3-VL (3B) https://huggingface.co/collections/Qwen/qwen3-vl

I am sure this is already worked on in Russia, Ukraine and The Netherlands. A lot can go wrong with autonomous flying. One could load the VLM on a high end android phone on the drone and have dual control.


Replies

SpyCoder77yesterday at 5:33 PM

A better way would be a VLA as opposed to a VLM. VLAs are meant to take action, where as vlms are for geneeral use. https://cognitivedrone.github.io/