Kimi K2 is a very impressive model! It's particularly un-obsequious, which makes it useful for actually checking your reasoning on things.
Some especially older ChatGPT models will tell you that everything you say is fantastic and great. Kimi -on the other hand- doesn't mind taking a detour to question your intelligence and likely your entire ancestry if you ask it to be brutal.
A single 512GB M3 Ultra is $9,499.00
https://www.apple.com/shop/buy-mac/mac-studio/apple-m3-ultra...
Claims as always misleading as they don't show the context length or prefill if you use a lot of context. As it will be fun waiting minutes for a reply.
Is there a linux equivalent of this setup? I see some mention of RDNA support for linux distros, but it's unclear to me if this is hardware-specific (requires ConnectX or in this case Apple Thunderbolt) or is there something interesting that can be done with "vanilla 10G NIC" hardware?
I get tempted to buy a couple of these, but I just feel like the amortization doesn’t make sense yet. Surely in the next few years this will be orders of magnitude cheaper.
What benchmarks are good these days? I generally just try different models on Cursor, but most of the open weight models aren't available there (Deepseak v3.2, Kimi K2 has some problems with formatting, and many others are missing) so I'd be curious to see some benchmarks - especially for non-web stuff (C++, Rust, etc).
You should mention that it is 4bit quant. Still very impressive!
Does this also run with Exo Labs' token pre-fill acceleration using DGX Spark? I.e. take 2 Sparks and 2 MacStudios and get a comparable inference speed to what 2x M5 Ultras will be able to do?
Isn't it the same model which won the competition of drawing a real-time clock recently?
Is this using the new RDMA over Thunderbolt support form macOS 26.2?
Is there no API for the Kimi K2 Instruct...?
Kimi K2 is a really weird model, just in general.
It's not nearly as smart as Opus 4.5 or 5.2-Pro or whatever, but it has a very distinct writing style and also a much more direct "interpersonal" style. As a writer of very-short-form stuff like emails, it's probably the best model available right now. As a chatbot, it's the only one that seems to really relish calling you out on mistakes or nonsense, and it doesn't hesitate to be blunt with you.
I get the feeling that it was trained very differently from the other models, which makes it situationally useful even if it's not very good for data analysis or working through complex questions. For instance, as it's both a good prose stylist and very direct/blunt, it's an extremely good editor.
I like it enough that I actually pay for a Kimi subscription.