Which model? The field moves so fast it’s hard to validate statements like this without that info.
O1-preview?
GPT-4o. I tried only a few samples on o1-preview, and the results were bad. That did not have any statistical significance, though
GPT-4o. I tried only a few samples on o1-preview, and the results were bad. That did not have any statistical significance, though