80tp/s with 5080 3090 combo is wild. I’ve been working with a 4090 and two Tenstorrent p150 cards, and manage only about 30 tps utilizing all three for qwen3.6 27b q8. Guess I got more optimization to do.
Would like to see the perf of their setup with and without mtp and ngram speculative decoding though, as well as parallel decode performance (once llamacpp mtp plays well with multiple slots).
Being in California electricity alone puts this non-competitive with just paying a cloud though.
How is the software compatibilty with the Tenstorrent cards? Are you stuck using vendor supplied runtimes/models?
It's surprising how little these things come up given the price they go for
That’s the cost of using a new hardware provider. A single RTX Pro 6000 Blackwell Max-Q will do better than that and be much more usable. I have 2 running DS4 Flash at 160 tok/s with max num seqs 4.
Very interesting though, these Tenstorrent chips. Might get one to experiment with.