Kimi K2 is the model that most consistently passes the clock test. I agree it's definitely got something unique going on
https://clocks.brianmoore.com/
Nice! I'm curious, what does this service cost to run? I notice that you don't have more expensive models like Opus but querying the models every minute must add up over time (excuse pun)?
Lol why's GPT 5 broken on that test. DeepSeek surprisingly crisp and robust
Nice! I'm curious, what does this service cost to run? I notice that you don't have more expensive models like Opus but querying the models every minute must add up over time (excuse pun)?