logoalt Hacker News

djydetoday at 4:52 PM6 repliesview on HN

What are the use cases for these small models? Is there anyone using models of this scale in their daily life who could share their experience?


Replies

philipkglasstoday at 5:41 PM

I have vLLM running on a Linux machine in my basement, connected with Tailscale, and I use small models as part of tasks like this:

- Transcribing scanned documents into formatted text

- Captioning/describing images and classifying them for audience suitability (includes anti-spam)

- Matching documents with relevant Wikipedia pages for tagging

I don't use them like frontier models. I break the work down into micro-tasks with one clear goal for each prompt. I write a lot of glue software to make the complete flow work. I was working on all of these tasks before LLMs appeared on the scene. The LLMs have allowed me to replace a lot of complicated code with less code plus a model, while achieving better results.

I use local models for reasons of cost and control. I already had the workstation and GPU. The only running cost is electricity. I have used proprietary models from OpenAI and Google for some of these tasks, but I also encountered churn when the models I built my tools around were retired. I don't worry about that when I have the weights saved locally.

robgoughtoday at 5:32 PM

I've got a home-built dictation app that uses a local model to clear up the text and fix grammar. It was super easy to build. I’m extending it to capture meeting notes and summarise too. All on-device.

I saw a little app the other day, I think someone posted on here, that looks at your screenshot and renames the file based off the contents of the file.

There's tons of little examples like that. For a lot of use cases, you really don't need the frontier models.

properbrewtoday at 5:31 PM

I think small models have a very good niche for specific tasks. I utilise a fine tuned Phi-4 model (smaller than this one) that fits in about 3.5gb of RAM (not vram) for the document processing side of things for the desktop app I develop (a bit of a shameless plug - whistle-enterprise.com).

If you have a very specific idea for local model use you can find a way to make it work very well, you don't even need to have a graphics card or NPU chip. You just have to be extremely constrained in how it's used. I think as a generic chatbot they're not great, I'd use a hosted SOTA model and I'm a big fan of local LLMs myself.

mhitzatoday at 5:42 PM

In theory, locally you'd use these where lossiness is acceptable for audio transcription and image labeling (as simple examples).

In practice I haven't got around to building something around multimodality since I'm primarily using their text generation capabilities.

Aachentoday at 5:09 PM

"Small" models are the ones I can run myself on my own terms. LLMs aren't useful enough for me to justify spending hundreds of euros on a GPU with 16GB VRAM or something, and that's assuming I have the rest of the desktop just laying around. Back when I checked (before the RAM price hike), these models weren't meaningfully better than 4-8GB ones anyway, you'd have to go for the top tier cards at 24 or 32 GB iirc to get something vaguely in the direction of the SaaS versions, and that was absolutely out of my budget. Even if that changed, so have hardware prices so it'd probably still work out the same

Xioltoday at 5:08 PM

I've yet to see someone answer a question like this with a decent, useful answer.