logoalt Hacker News

nik736yesterday at 3:30 PM2 repliesview on HN

I have put together an internal benchmark on 1000s of business documents with weird tables, structure, etc. that I run on every relevant model release. Opus 4.8 performs very very well. But it is obviously overkill for the task (and expensive at doing so). I just wanted to respond to the OP.


Replies

Insanityyesterday at 3:46 PM

I'm assuming that the reason I didn't have good success rate is because it was not scanned documents, but photographs, and lighting conditions weren't always ideal. I think scanned business documents are a happy-case scenario in a way. (obv, you seem to run it against some complex documents, so that's impressive)

apawloskiyesterday at 4:58 PM

I’m curious what your findings are for the best model for your use case