I can get great results from a YOLO model with 30M to maybe 300M params. To get decent CV from a LLM...

wongarsu • today at 8:06 AM • 2 replies • view on HN

I can get great results from a YOLO model with 30M to maybe 300M params. To get decent CV from a LLM 8B params is the absolute minimum, closer to 30B for interesting tasks

I might be on board about LLMs being the future of OCR (though many would disagree), but for general CV they are very inefficient for very limited benefit

Replies

IanCal • today at 8:25 AM

They can however be extremely useful for curating training data. Also things like SAM and the DINO (/grounding dino) models.

Also if they are better then you can also have a flow that’s cheap model -> marginal cases go to more complex thing (and a chain of these).

The yolo models are really shockingly good for their cost and how well they can work with not much training data as well.

charcircuit • today at 11:14 AM

>for very limited benefit

Due to how simple they are to work with they will become popular. Compare NLP before and after GPT-3. GPT-3 majorly brought down the complexity and skill needed for doing NLP tasks even if traditional NLP is much much faster. Ultimately ease of development will win out and the industry will work towards optimizing running such LLMs to make it cheap enough to run.

alt Hacker News

Replies