Can't speak to this specific NPU but these kind of accelerators are really made more for more general ML things like machine vision etc. For example while people have made the (6 TOPS) NPU in the (similar board) RK3588 work with llama.cpp it isn't super useful because of the RAM constraints. I believe it has some sort of 32-bit memory addressing limit, so you can never give it more than 3 or 4 GB for example. So for LLMs, not all that useful.