Ran some of my internal benchmarks against this and I'm very unimpressed. I don't think th...

creddit • today at 4:50 PM • 1 reply • view on HN

Ran some of my internal benchmarks against this and I'm very unimpressed. I don't think this moves them into the OAI v Anthropic v Gemini conversation at all.

Major analytical errors in their response to multiple of my technical questions.

Replies

creddit • today at 5:02 PM

Playing with this some more and it's actively not good. Just basic mathematical errors riddling responses. Did some basic adversarial testing where its responses are analyzed by Gemini and Gemini is finding basic math errors across every relatively (relative to Opus, Gemini or GPT can handle) simple ask I make. Yikes.

alt Hacker News

Replies