More slop again. The way to get more throughput is to bump batch size, not to try and "multithread" job submits to the NPU as if its a CPU.