logoalt Hacker News

ldjkfkdsjnv01/20/20252 repliesview on HN

These models always seem great, until you actually use them for real tasks. The reliability goes way down, you cant trust the output like you can with even a lower end model like 4o. The benchmarks aren't capturing some kind of common sense usability metric, where you can trust the model to handle random small amounts of ambiguity in every day real world prompts


Replies

pizza01/20/2025

Fair point. Actually probably the best part about having beaucoup bucks like Open AI is being able to chase down all the manifold little ‘last-mile’ imperfections with an army of many different research teams.

washadjeffmad01/20/2025

That seems like both a generalization and hyperbole. How are you envisioning this being deployed?