logoalt Hacker News

azakaitoday at 3:55 PM1 replyview on HN

A carb counting app might use API calls to these frontier models and then do some kind of analysis. It could see if different models agree or not, or multiple calls, and with how much variance.

So it would be more accurate to test the apps rather than the APIs, unless the goal is to warn people that just open chatgpt and ask there.


Replies

notahackertoday at 5:33 PM

The open source app could in theory do that, but the paper's authors would be able to determine whether it did or not by reading its code, which they evidently did to replicate the API calls it made with their own script.

(And of course it would also be far more tedious to submit each picture 500 times manually using an app and manually log the response than using a script which is designed to collect the data automatically as fast as API rate limits permit)