Woah, is this part of the future of models? Basically little models you can use as tools.
It's looking like running your own mini ecosystem is the way of the future to me. No data centers, just a decent GPU 16-24gb of VRAM, CPU, and 32gb of RAM.
Eventually we'll have models small enough to do a single thing really well and we'll call them functions.
I'm pretty sure there's someone somewhere who'll create a proper harness that's equivalent to one giant model. The difficulty is mostly local hardware has lot of memory constraints. Targeting 128GB would seem to be the current sweet spot. If we could get out of the corporate market movers of buying up all the memory, we could maybe have more.
Regardless, the people in the 80s capable of pruning programs to fit on small devices is likely happening now. I'd bet most of the Chinese firms are doing it because of the US's silly GPU games among other constraints.
https://www.docling.ai/
I don’t know how many difference little models this uses under the hood, but I was shocked at how good it was at the couple document extraction tasks I threw it at.