M3 Ultra has a crappy GPU, somewhere around 3060Ti-3070. Its only benefit is the memory throughput that makes LLM token generation fast, at around 3080 level. But token prefill that determines time-to-first-token is extremely slow, and coincidentally all those tasks you mentioned above would be around 3060Ti level. That's why Exo coupled DGX Spark (5090 performance for FP4) with MacStudio and sped it up 4x. M5 Ultra is supposed to be as fast as DGX Spark at FP4 due to new neural cores.