logoalt Hacker News

anon291today at 4:24 PM1 replyview on HN

Most llms can equally engage with text in picture form as text in token form. In fact my initial research on this (later corroborated by actual published papers) indicate that this is a cheap way to save on tokens.


Replies

billtarbelltoday at 4:32 PM

Oh interesting and good to know on the token savings with this technique. My test with claude had it use vision and then programmatically test different variable font input variables (mimicking the user scrub interaction) until it was able to OCR it.