NPUs like this tend to have one thing in common: being decorative without drivers and support 9 times out of 10.
Even if it worked though, they're usually heavily bandwidth bottlenecked and near useless for LLM inference. CPU wins every time.